Dear SG16, Titus would like to know if SG16 has no objection to LEWG reviewing http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1030r2.pdf "std::filesystem::path_view" at the Cologne meeting? SG16 approval is sought because P1030 depends on P0482 char8_t.
Disclaimer: I have not yet read P1030R2, though I have read the
SG16 reviewed prior revisions of P1030 at several of our
telecons. Notes on those reviews are available at the following
links (search for 'path_view'):
The only poll we conducted was during the May 30th, 2018 meeting. Polls at our telecons are informal.
I would like for SG16 to re-review P1030 at Cologne and take some official polls. I added it to our agenda at http://wiki.edg.com/bin/view/Wg21cologne2019/SG16.
Niall, will you be present in Cologne?
Note that the SG16 Unicode Direction paper was updated for the
pre-meeting mailing with relevant discussion concerning encoding
of file names. Please take a look at P1238R1
sections 2.7 and 6.3 and provide any feedback.
Presumably via a reinterpret_cast somewhere along the line? That would be UB outside the standard library.To explain the dependency, path_view really wants to avoid ever copying memory. This means we cannot use the same API as filesystem::path. So, if on Windows you write: path_view(u16"c:/some/path"); ... then the literal string gets passed directly through to the wide character Windows syscalls, uncopied.
This seems to (necessarily) violate the "avoid ever copying memory" requirement. It also assumes an encoding of filenames that should be left up to the implementation (I would be opposed to specifying a conversion to UTF-8 here; that would be inconsistent with std::filesystem::path).If on POSIX, it would undergo a just-in-time downconversion to UTF-8.
I personally don't feel it is necessary to refuse 'char', though I understand the motivation. The inconsistency with std::filesystem::path is more concerning to me than the use of 'char' to mean "bytes". This is something worth polling in SG16.This works by path_view inspecting the char literal passed to it for its type. It refuses to accept `char`, it will only accept byte , char8_t, char16_t and char32_t. In other words, you must explicitly specify the UTF encoding for path view input.
Is SG16 happy for LEWG to review P1030?
there is sufficient justification for SG16 review prior to LEWG
review. However, I also think there is sufficient content in the
proposal that does not require review by SG16 that LEWG could make
progress reviewing those parts while awaiting SG16 review (I
suspect LEWG will request changes). Ideally, SG16 will review
this at the meeting on Wednesday and LEWG will review it with SG16
input included Thursday or Friday. But that scheduling is, of
course, up to the LEWG chair.
Niall, assuming you will be able to attend SG16 in Cologne,
please come prepared with a specific list of questions/polls you
would like feedback on. I'll have a list as well (though I don't
Niall : Raw bytes are accepted because most filesystems will actually accept raw binary bytes (minus a few reserved values such as '/' or 0) as path identifiers without issue. There is no interpretation done based on Unicode. Thus, not supporting raw bytes makes many valid filenames impossible to work with.
Agreed, though for some filesystems and filesystem APIs, it is
even worse (e.g., '\' (0x5C) can appear as a trailing code unit
for Shift-JIS file names for Windows ANSI file APIs and does not
indicate a path separator in that case).
_______________________________________________ SG16 Unicode mailing list Unicode@isocpp.open-std.org http://www.open-std.org/mailman/listinfo/unicode