C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 12 Aug 2019 22:25:35 -0400
I agree with this (mostly), but would prefer not to discuss further in
this thread. The only reason I included the filesystem references is
because the wording there uses "native" for an encoding that is related
(though distinct) from the encodings referenced in the codecvt and ctype
wording, where "native" is also used. This suggests that "native"
serves (or should serve) a role in naming these run-time encodings, or
is a source of conflation (or both).

Tom.

On 8/12/19 5:08 PM, Niall Douglas wrote:
>> 1. [fs.path.type.cvt]p1 <http://eel.is/c++draft/fs.path.type.cvt#1>:
>> (though the definition provided here appears to be specific to path
>> names).
>> "The /native encoding/ of an ordinary character string is the
>> operating system dependent current encoding for path names. The
>> /native encoding/ for wide character strings is the
>> implementation-defined execution wide-character set encoding."
> We discussed the problems with the choice of normative wording in
> http://eel.is/c++draft/fs.class.path#fs.path.cvt, if you remember,
> during SG16's discussion of filesystem::path_view.
>
> The problem is that filesystem paths have different encoding and
> interpretation per-path-component i.e. for a path
>
> /A/B/C/D
>
> ... A, B, C and D may each have its own, individual, encoding and
> interpretation depending on the mount points and filesystems configured
> on the current system. This is not what is suggested by the current
> normative wording, which appears to think that some mapping exists
> between C++ paths and OS kernel paths.
>
> There *is* a mapping, but it is 100% C++-side. The OS kernel generally
> consumes arrays of bytes.
>
> A more correct normative wording would more clearly separate these two
> kinds of path representation. OS kernel paths are arrays of `byte`, but
> with certain implementation-defined byte sequences not permitted. C++
> paths can be in char, wchar_t, char8_t, char16_t, char32_t etc, and
> there are well defined conversions between those C++ paths and the array
> of bytes supplied to the OS kernel. The standard can say nothing useful
> about how the OS kernel may interpret the byte array C++ supplies to it.
>
> If path_view starts the standards track, I'll need to propose a document
> fixing up http://eel.is/c++draft/fs.class.path#fs.path.cvt in any case.
> But to come back to your original question, I think that you ought to
> split off filesystem paths from everything else, consider them separate,
> and then I think you'll find it much easier to make the non-path
> normative wording more consistent.
>
> Niall
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode

Received on 2019-08-13 04:25:41