Date: Thu, 15 Oct 2020 17:52:30 +0100
On 15/10/2020 15:20, Peter Dimov wrote:
> Niall Douglas wrote:
>
>> Your mechanism would have that implicit and thus more subject to
>> unintentional duplication of work performed, whereas the existing
>> mechanism is explicit and less subject to surprise because it forces
>> the user to spell things out.
>
> Conversely, it also scopes the temporary buffer properly by default,
> whereas you could easily duplicate and triplicate the stack space
> required unless you scope the c_str<> variables by hand. But yes, I see
> your point.
I don't know if I'm uncommon, but in the code I write, it is very rare
that I would pass more than two filesystem paths to syscalls in the same
function. If I were about to do so, I'd split out the code into a new
function.
But maybe that's just me. In any case, if the lifetime of c_str ends
before a new c_str is created, compilers reuse the stack storage as you
can see at https://godbolt.org/z/7c8W8W.
Alas, because of the syscalls causing the compiler to assume values have
escaped, if both c_str instances have lifetime then the stack
requirements do double: https://godbolt.org/z/bjvqzP
Note that c_str *is* assignable, so you can reuse an existing c_str very
easily across multiple syscalls e.g. https://godbolt.org/z/T3Ka3P
>> The Deleter can have state, just pass it in to the constructor.
>
> In R4 it can. I was looking at R3, because, as you may have guessed, I
> had to open it in order to look at the design rationale.
Apologies for that. The lack of being able to pass in Deleter state in
R3 was pure oversight on my behalf, it makes no sense if you store a
templated Deleter instance to not be able to say how to construct it. As
I know you know, editing LaTeX containing code very frequently produces
incorrect or bad code. And, to be honest, the reference implementation
was lagging the paper revision badly until just now.
>> If we insisted on Allocators, we would force a needless extra dynamic
>> memory allocation and memory copy solely because we insisted on
>> Allocators.
>
> I don't see why.
My personal primary motivation are functions such as
https://docs.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-rtlansistringtounicodestring
or
https://docs.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-rtlunicodestringtoansistring
which are much more efficient if you let them allocate the destination
buffer for you. They allocate that buffer using the NT kernel RTL
allocator, and you need to call special free functions for each
destination reencoding type.
You and others might say "just use MultiByteToWideChar() etc", but it's
an order of magnitude slower and confers no advantage whatsoever over
the NT kernel APIs. Also, remember that most portable C++ will use
byte-based paths, so the conversion from char or char8_t to wchar_t will
be unusually common on Windows. Making that as fast as possible confers
real gains for Windows users.
I appreciate all this will seem nitpicky to almost anyone reading this.
However about 40% of the apparent slowness of the filesystem on
Microsoft Windows is due to hidden path conversions, and fixing this
stuff turns Windows from ~120x slower than Linux into a mere ~75x slower
than Linux for filesystem path based operations. That's a BIG difference
every time you do a cmake configure, for example.
Niall
> Niall Douglas wrote:
>
>> Your mechanism would have that implicit and thus more subject to
>> unintentional duplication of work performed, whereas the existing
>> mechanism is explicit and less subject to surprise because it forces
>> the user to spell things out.
>
> Conversely, it also scopes the temporary buffer properly by default,
> whereas you could easily duplicate and triplicate the stack space
> required unless you scope the c_str<> variables by hand. But yes, I see
> your point.
I don't know if I'm uncommon, but in the code I write, it is very rare
that I would pass more than two filesystem paths to syscalls in the same
function. If I were about to do so, I'd split out the code into a new
function.
But maybe that's just me. In any case, if the lifetime of c_str ends
before a new c_str is created, compilers reuse the stack storage as you
can see at https://godbolt.org/z/7c8W8W.
Alas, because of the syscalls causing the compiler to assume values have
escaped, if both c_str instances have lifetime then the stack
requirements do double: https://godbolt.org/z/bjvqzP
Note that c_str *is* assignable, so you can reuse an existing c_str very
easily across multiple syscalls e.g. https://godbolt.org/z/T3Ka3P
>> The Deleter can have state, just pass it in to the constructor.
>
> In R4 it can. I was looking at R3, because, as you may have guessed, I
> had to open it in order to look at the design rationale.
Apologies for that. The lack of being able to pass in Deleter state in
R3 was pure oversight on my behalf, it makes no sense if you store a
templated Deleter instance to not be able to say how to construct it. As
I know you know, editing LaTeX containing code very frequently produces
incorrect or bad code. And, to be honest, the reference implementation
was lagging the paper revision badly until just now.
>> If we insisted on Allocators, we would force a needless extra dynamic
>> memory allocation and memory copy solely because we insisted on
>> Allocators.
>
> I don't see why.
My personal primary motivation are functions such as
https://docs.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-rtlansistringtounicodestring
or
https://docs.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-rtlunicodestringtoansistring
which are much more efficient if you let them allocate the destination
buffer for you. They allocate that buffer using the NT kernel RTL
allocator, and you need to call special free functions for each
destination reencoding type.
You and others might say "just use MultiByteToWideChar() etc", but it's
an order of magnitude slower and confers no advantage whatsoever over
the NT kernel APIs. Also, remember that most portable C++ will use
byte-based paths, so the conversion from char or char8_t to wchar_t will
be unusually common on Windows. Making that as fast as possible confers
real gains for Windows users.
I appreciate all this will seem nitpicky to almost anyone reading this.
However about 40% of the apparent slowness of the filesystem on
Microsoft Windows is due to hidden path conversions, and fixing this
stuff turns Windows from ~120x slower than Linux into a mere ~75x slower
than Linux for filesystem path based operations. That's a BIG difference
every time you do a cmake configure, for example.
Niall
Received on 2020-10-15 11:52:35