On 7/3/19 11:11 AM, Niall Douglas wrote:
Dear SG16,

Titus would like to know if SG16 has no objection to LEWG reviewing
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1030r2.pdf
"std::filesystem::path_view" at the Cologne meeting? SG16 approval is
sought because P1030 depends on P0482 char8_t.

Disclaimer: I have not yet read P1030R2, though I have read the prior revisions.

SG16 reviewed prior revisions of P1030 at several of our telecons.  Notes on those reviews are available at the following links (search for 'path_view'):

The only poll we conducted was during the May 30th, 2018 meeting.  Polls at our telecons are informal.

I would like for SG16 to re-review P1030 at Cologne and take some official polls.  I added it to our agenda at http://wiki.edg.com/bin/view/Wg21cologne2019/SG16.

Niall, will you be present in Cologne?

Note that the SG16 Unicode Direction paper was updated for the pre-meeting mailing with relevant discussion concerning encoding of file names.  Please take a look at P1238R1 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1238r1.html) sections 2.7 and 6.3 and provide any feedback.


To explain the dependency, path_view really wants to avoid ever copying
memory. This means we cannot use the same API as filesystem::path. So,
if on Windows you write:

path_view(u16"c:/some/path");

... then the literal string gets passed directly through to the wide
character Windows syscalls, uncopied.
Presumably via a reinterpret_cast somewhere along the line?  That would be UB outside the standard library.
If on POSIX, it would undergo a
just-in-time downconversion to UTF-8.
This seems to (necessarily) violate the "avoid ever copying memory" requirement.  It also assumes an encoding of filenames that should be left up to the implementation (I would be opposed to specifying a conversion to UTF-8 here; that would be inconsistent with std::filesystem::path).

This works by path_view inspecting the char literal passed to it for its
type. It refuses to accept `char`, it will only accept byte [1],
char8_t, char16_t and char32_t. In other words, you must explicitly
specify the UTF encoding for path view input.
I personally don't feel it is necessary to refuse 'char', though I understand the motivation.  The inconsistency with std::filesystem::path is more concerning to me than the use of 'char' to mean "bytes".  This is something worth polling in SG16.

Is SG16 happy for LEWG to review P1030?

Per P1253 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1253r0.html), there is sufficient justification for SG16 review prior to LEWG review.  However, I also think there is sufficient content in the proposal that does not require review by SG16 that LEWG could make progress reviewing those parts while awaiting SG16 review (I suspect LEWG will request changes).  Ideally, SG16 will review this at the meeting on Wednesday and LEWG will review it with SG16 input included Thursday or Friday.  But that scheduling is, of course, up to the LEWG chair.

Niall, assuming you will be able to attend SG16 in Cologne, please come prepared with a specific list of questions/polls you would like feedback on.  I'll have a list as well (though I don't yet).


Niall

[1]: Raw bytes are accepted because most filesystems will actually
accept raw binary bytes (minus a few reserved values such as '/' or 0)
as path identifiers without issue. There is no interpretation done based
on Unicode. Thus, not supporting raw bytes makes many valid filenames
impossible to work with.

Agreed, though for some filesystems and filesystem APIs, it is even worse (e.g., '\' (0x5C) can appear as a trailing code unit for Shift-JIS file names for Windows ANSI file APIs and does not indicate a path separator in that case).

Tom.

_______________________________________________
SG16 Unicode mailing list
Unicode@isocpp.open-std.org
http://www.open-std.org/mailman/listinfo/unicode