C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Mon, 12 Aug 2019 22:08:30 +0100
> 1. [fs.path.type.cvt]p1 <http://eel.is/c++draft/fs.path.type.cvt#1>:
> (though the definition provided here appears to be specific to path
> names).
> "The /native encoding/ of an ordinary character string is the
> operating system dependent current encoding for path names. The
> /native encoding/ for wide character strings is the
> implementation-defined execution wide-character set encoding."

We discussed the problems with the choice of normative wording in
http://eel.is/c++draft/fs.class.path#fs.path.cvt, if you remember,
during SG16's discussion of filesystem::path_view.

The problem is that filesystem paths have different encoding and
interpretation per-path-component i.e. for a path

/A/B/C/D

... A, B, C and D may each have its own, individual, encoding and
interpretation depending on the mount points and filesystems configured
on the current system. This is not what is suggested by the current
normative wording, which appears to think that some mapping exists
between C++ paths and OS kernel paths.

There *is* a mapping, but it is 100% C++-side. The OS kernel generally
consumes arrays of bytes.

A more correct normative wording would more clearly separate these two
kinds of path representation. OS kernel paths are arrays of `byte`, but
with certain implementation-defined byte sequences not permitted. C++
paths can be in char, wchar_t, char8_t, char16_t, char32_t etc, and
there are well defined conversions between those C++ paths and the array
of bytes supplied to the OS kernel. The standard can say nothing useful
about how the OS kernel may interpret the byte array C++ supplies to it.

If path_view starts the standards track, I'll need to propose a document
fixing up http://eel.is/c++draft/fs.class.path#fs.path.cvt in any case.
But to come back to your original question, I think that you ought to
split off filesystem paths from everything else, consider them separate,
and then I think you'll find it much easier to make the non-path
normative wording more consistent.

Niall

Received on 2019-08-13 00:08:41