Date: Wed, 3 Jul 2019 12:07:46 -0400
On 7/3/19 11:11 AM, Niall Douglas wrote:
> Dear SG16,
>
> Titus would like to know if SG16 has no objection to LEWG reviewing
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1030r2.pdf
> "std::filesystem::path_view" at the Cologne meeting? SG16 approval is
> sought because P1030 depends on P0482 char8_t.
Disclaimer: I have not yet read P1030R2, though I have read the prior
revisions.
SG16 reviewed prior revisions of P1030 at several of our telecons.
Notes on those reviews are available at the following links (search for
'path_view'):
* https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md#july-11th-2018
* https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md#may-30th-2018
The only poll we conducted was during the May 30th, 2018 meeting. Polls
at our telecons are informal.
I would like for SG16 to re-review P1030 at Cologne and take some
official polls. I added it to our agenda at
http://wiki.edg.com/bin/view/Wg21cologne2019/SG16.
Niall, will you be present in Cologne?
Note that the SG16 Unicode Direction paper was updated for the
pre-meeting mailing with relevant discussion concerning encoding of file
names. Please take a look at P1238R1
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1238r1.html)
sections 2.7 and 6.3 and provide any feedback.
>
> To explain the dependency, path_view really wants to avoid ever copying
> memory. This means we cannot use the same API as filesystem::path. So,
> if on Windows you write:
>
> path_view(u16"c:/some/path");
>
> ... then the literal string gets passed directly through to the wide
> character Windows syscalls, uncopied.
Presumably via a reinterpret_cast somewhere along the line? That would
be UB outside the standard library.
> If on POSIX, it would undergo a
> just-in-time downconversion to UTF-8.
This seems to (necessarily) violate the "avoid ever copying memory"
requirement. It also assumes an encoding of filenames that should be
left up to the implementation (I would be opposed to specifying a
conversion to UTF-8 here; that would be inconsistent with
std::filesystem::path).
>
> This works by path_view inspecting the char literal passed to it for its
> type. It refuses to accept `char`, it will only accept byte [1],
> char8_t, char16_t and char32_t. In other words, you must explicitly
> specify the UTF encoding for path view input.
I personally don't feel it is necessary to refuse 'char', though I
understand the motivation. The inconsistency with std::filesystem::path
is more concerning to me than the use of 'char' to mean "bytes". This
is something worth polling in SG16.
>
> Is SG16 happy for LEWG to review P1030?
Per P1253
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1253r0.html),
there is sufficient justification for SG16 review prior to LEWG review.
However, I also think there is sufficient content in the proposal that
does not require review by SG16 that LEWG could make progress reviewing
those parts while awaiting SG16 review (I suspect LEWG will request
changes). Ideally, SG16 will review this at the meeting on Wednesday
and LEWG will review it with SG16 input included Thursday or Friday.
But that scheduling is, of course, up to the LEWG chair.
Niall, assuming you will be able to attend SG16 in Cologne, please come
prepared with a specific list of questions/polls you would like feedback
on. I'll have a list as well (though I don't yet).
>
> Niall
>
> [1]: Raw bytes are accepted because most filesystems will actually
> accept raw binary bytes (minus a few reserved values such as '/' or 0)
> as path identifiers without issue. There is no interpretation done based
> on Unicode. Thus, not supporting raw bytes makes many valid filenames
> impossible to work with.
Agreed, though for some filesystems and filesystem APIs, it is even
worse (e.g., '\' (0x5C) can appear as a trailing code unit for Shift-JIS
file names for Windows ANSI file APIs and does not indicate a path
separator in that case).
Tom.
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
> Dear SG16,
>
> Titus would like to know if SG16 has no objection to LEWG reviewing
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1030r2.pdf
> "std::filesystem::path_view" at the Cologne meeting? SG16 approval is
> sought because P1030 depends on P0482 char8_t.
Disclaimer: I have not yet read P1030R2, though I have read the prior
revisions.
SG16 reviewed prior revisions of P1030 at several of our telecons.
Notes on those reviews are available at the following links (search for
'path_view'):
* https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md#july-11th-2018
* https://github.com/sg16-unicode/sg16-meetings/blob/master/README.md#may-30th-2018
The only poll we conducted was during the May 30th, 2018 meeting. Polls
at our telecons are informal.
I would like for SG16 to re-review P1030 at Cologne and take some
official polls. I added it to our agenda at
http://wiki.edg.com/bin/view/Wg21cologne2019/SG16.
Niall, will you be present in Cologne?
Note that the SG16 Unicode Direction paper was updated for the
pre-meeting mailing with relevant discussion concerning encoding of file
names. Please take a look at P1238R1
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1238r1.html)
sections 2.7 and 6.3 and provide any feedback.
>
> To explain the dependency, path_view really wants to avoid ever copying
> memory. This means we cannot use the same API as filesystem::path. So,
> if on Windows you write:
>
> path_view(u16"c:/some/path");
>
> ... then the literal string gets passed directly through to the wide
> character Windows syscalls, uncopied.
Presumably via a reinterpret_cast somewhere along the line? That would
be UB outside the standard library.
> If on POSIX, it would undergo a
> just-in-time downconversion to UTF-8.
This seems to (necessarily) violate the "avoid ever copying memory"
requirement. It also assumes an encoding of filenames that should be
left up to the implementation (I would be opposed to specifying a
conversion to UTF-8 here; that would be inconsistent with
std::filesystem::path).
>
> This works by path_view inspecting the char literal passed to it for its
> type. It refuses to accept `char`, it will only accept byte [1],
> char8_t, char16_t and char32_t. In other words, you must explicitly
> specify the UTF encoding for path view input.
I personally don't feel it is necessary to refuse 'char', though I
understand the motivation. The inconsistency with std::filesystem::path
is more concerning to me than the use of 'char' to mean "bytes". This
is something worth polling in SG16.
>
> Is SG16 happy for LEWG to review P1030?
Per P1253
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1253r0.html),
there is sufficient justification for SG16 review prior to LEWG review.
However, I also think there is sufficient content in the proposal that
does not require review by SG16 that LEWG could make progress reviewing
those parts while awaiting SG16 review (I suspect LEWG will request
changes). Ideally, SG16 will review this at the meeting on Wednesday
and LEWG will review it with SG16 input included Thursday or Friday.
But that scheduling is, of course, up to the LEWG chair.
Niall, assuming you will be able to attend SG16 in Cologne, please come
prepared with a specific list of questions/polls you would like feedback
on. I'll have a list as well (though I don't yet).
>
> Niall
>
> [1]: Raw bytes are accepted because most filesystems will actually
> accept raw binary bytes (minus a few reserved values such as '/' or 0)
> as path identifiers without issue. There is no interpretation done based
> on Unicode. Thus, not supporting raw bytes makes many valid filenames
> impossible to work with.
Agreed, though for some filesystems and filesystem APIs, it is even
worse (e.g., '\' (0x5C) can appear as a trailing code unit for Shift-JIS
file names for Windows ANSI file APIs and does not indicate a path
separator in that case).
Tom.
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
Received on 2019-07-03 18:16:48