Date: Thu, 15 Oct 2020 14:54:25 +0100
On 15/10/2020 12:37, Peter Dimov wrote:
> Niall Douglas wrote:
>
>> Please find attached draft 2 of proposed normative wording for D1030R4
>> std::filesystem::path_view.
>
> The suggested use pattern is
>
> int open_file(path_view path)
> {
> path_view::c_str<> p(path);
> return ::open(p.buffer, O_RDONLY);
> }
>
> but the name c_str implies
>
> int open_file(path_view path)
> {
> return ::open(path.c_str(), O_RDONLY);
> }
>
> which is both more convenient and less error-prone because it avoids the
> need for another name (`p` in the above).
>
> It's not hard to make the latter work. path_view::c_str() can be a
> function returning c_str_result, which is the current c_str struct by
> another name, and c_str_result can have a conversion to T const*.
We previously discussed this within the SGs and LEWG, indeed I remember
Titus as then chair saying he had never seen such a design pattern
presented in his time there, and it might be controversial. Since then
it has had two (three?) rounds before LEWG.
The alternative mechanism you propose suffers from a number of drawbacks
relative to what is currently proposed:
1. In default configuration, there is a non-trivial amount of stack
consumed by the rendition of path views to native filesystem encoding.
There is also, potentially, a very significant amount of work performed
relative to the syscall invoked. Your mechanism would have that implicit
and thus more subject to unintentional duplication of work performed,
whereas the existing mechanism is explicit and less subject to surprise
because it forces the user to spell things out.
2. In POSIX filesystem APIs it is typical for the syscall to directly
take a pointer to the filesystem path, and thus the lifetime of the
returned temporary is sufficient. However, other filesystem APIs are not
like this e.g. you might write the pointer to the filesystem path into a
UNICODE_STRING structure on the stack, a pointer to which is then passed
to the syscall. We therefore need lifetime to be more explicitly
managed, and to not be temporary-orientated.
3. What eventually won LEWG over to this choice of design is that the
total number of people who ever need to write code which feeds path
views to syscalls is proportionately quite small to overall path view
users. Upon the same rationale, LEWG last meeting gave feedback that
c_str's constructor ought to not default the zero termination parameter,
and instead force the user to always spell out the zero termination.
Yes, this is inconvenient, but the inconvenience is contained to a small
proportion of end users in a very limited use case where mistakes in
choosing that parameter value generate high relative expense. I can
speak from implementation experience that defaulting that parameter did
lead to an unintentional choice of the wrong value at least twice in
code I wrote, so I accepted that feedback.
4. Finally, c_str is only one of many ways to render path view content.
A lot of path view users won't use c_str at all e.g. anyone who needs to
store a path in case has no need to use c_str because there is no gain
in doing so (just visit the backing storage and construct a path
directly from the backing representation). There are therefore good
grounds for then arguing that struct c_str ought to be less confusingly
named. I remember a fair bit of argument on that in WG21, and no
consensus emerged for changing its name (retaining the name also has
advantages in terms of muscle memory, and it being a struct not a
function is a useful reminder that path views are not string views, and
there is no chance of unintentional misuse).
> In addition, while (the current) c_str allows both the allocation and
> the deallocation to be customized, allocation is overridden by passing a
> function object to the constructor, whereas deallocation is overriden by
> a Deleter template parameter (which can have no state.)
The Deleter can have state, just pass it in to the constructor.
> It would be more symmetric and more consistent with the rest of the
> standard library if both were customized via the use of an Allocator.
> (Which could also allow the use of memory_resource*.)
You can wrap up an Allocator into the mechanism provided, including
memory_resource.
The mechanism provided is to permit use of platform specific facilities
which always perform a dynamic memory allocation. If we insisted on
Allocators, we would force a needless extra dynamic memory allocation
and memory copy solely because we insisted on Allocators. Given that 99%
of users will use the default allocation mechanism, and that STL
Allocators can be chosen if desired, it is a better tradeoff to enable
the default allocation mechanism to go twice as quickly.
You may notice that we aren't being innovative here, we use the exact
same mechanism as unique_ptr does. I'm not actually familiar with the
history of why unique_ptr doesn't use Allocators, but here there is a
real win on some platforms for replicating unique_ptr's approach, and
very little cost except in more once-off typing for users binding
path_view::c_str into an Allocator-specialised derivative.
Niall
> Niall Douglas wrote:
>
>> Please find attached draft 2 of proposed normative wording for D1030R4
>> std::filesystem::path_view.
>
> The suggested use pattern is
>
> int open_file(path_view path)
> {
> path_view::c_str<> p(path);
> return ::open(p.buffer, O_RDONLY);
> }
>
> but the name c_str implies
>
> int open_file(path_view path)
> {
> return ::open(path.c_str(), O_RDONLY);
> }
>
> which is both more convenient and less error-prone because it avoids the
> need for another name (`p` in the above).
>
> It's not hard to make the latter work. path_view::c_str() can be a
> function returning c_str_result, which is the current c_str struct by
> another name, and c_str_result can have a conversion to T const*.
We previously discussed this within the SGs and LEWG, indeed I remember
Titus as then chair saying he had never seen such a design pattern
presented in his time there, and it might be controversial. Since then
it has had two (three?) rounds before LEWG.
The alternative mechanism you propose suffers from a number of drawbacks
relative to what is currently proposed:
1. In default configuration, there is a non-trivial amount of stack
consumed by the rendition of path views to native filesystem encoding.
There is also, potentially, a very significant amount of work performed
relative to the syscall invoked. Your mechanism would have that implicit
and thus more subject to unintentional duplication of work performed,
whereas the existing mechanism is explicit and less subject to surprise
because it forces the user to spell things out.
2. In POSIX filesystem APIs it is typical for the syscall to directly
take a pointer to the filesystem path, and thus the lifetime of the
returned temporary is sufficient. However, other filesystem APIs are not
like this e.g. you might write the pointer to the filesystem path into a
UNICODE_STRING structure on the stack, a pointer to which is then passed
to the syscall. We therefore need lifetime to be more explicitly
managed, and to not be temporary-orientated.
3. What eventually won LEWG over to this choice of design is that the
total number of people who ever need to write code which feeds path
views to syscalls is proportionately quite small to overall path view
users. Upon the same rationale, LEWG last meeting gave feedback that
c_str's constructor ought to not default the zero termination parameter,
and instead force the user to always spell out the zero termination.
Yes, this is inconvenient, but the inconvenience is contained to a small
proportion of end users in a very limited use case where mistakes in
choosing that parameter value generate high relative expense. I can
speak from implementation experience that defaulting that parameter did
lead to an unintentional choice of the wrong value at least twice in
code I wrote, so I accepted that feedback.
4. Finally, c_str is only one of many ways to render path view content.
A lot of path view users won't use c_str at all e.g. anyone who needs to
store a path in case has no need to use c_str because there is no gain
in doing so (just visit the backing storage and construct a path
directly from the backing representation). There are therefore good
grounds for then arguing that struct c_str ought to be less confusingly
named. I remember a fair bit of argument on that in WG21, and no
consensus emerged for changing its name (retaining the name also has
advantages in terms of muscle memory, and it being a struct not a
function is a useful reminder that path views are not string views, and
there is no chance of unintentional misuse).
> In addition, while (the current) c_str allows both the allocation and
> the deallocation to be customized, allocation is overridden by passing a
> function object to the constructor, whereas deallocation is overriden by
> a Deleter template parameter (which can have no state.)
The Deleter can have state, just pass it in to the constructor.
> It would be more symmetric and more consistent with the rest of the
> standard library if both were customized via the use of an Allocator.
> (Which could also allow the use of memory_resource*.)
You can wrap up an Allocator into the mechanism provided, including
memory_resource.
The mechanism provided is to permit use of platform specific facilities
which always perform a dynamic memory allocation. If we insisted on
Allocators, we would force a needless extra dynamic memory allocation
and memory copy solely because we insisted on Allocators. Given that 99%
of users will use the default allocation mechanism, and that STL
Allocators can be chosen if desired, it is a better tradeoff to enable
the default allocation mechanism to go twice as quickly.
You may notice that we aren't being innovative here, we use the exact
same mechanism as unique_ptr does. I'm not actually familiar with the
history of why unique_ptr doesn't use Allocators, but here there is a
real win on some platforms for replicating unique_ptr's approach, and
very little cost except in more once-off typing for users binding
path_view::c_str into an Allocator-specialised derivative.
Niall
Received on 2020-10-15 08:54:29