C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] P1689: Encoding of filenames for interchange

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 5 Sep 2019 11:34:33 -0400
On 9/5/19 8:57 AM, Niall Douglas wrote:
> On 05/09/2019 12:11, Lyberta wrote:
>> Do we really expect C++20 build systems to run on filesystem paths not
>> representable as UTF-8? Users who do that really shoot themselves in the
>> foot.
> As Tom likes to say, EBCDIC.

Lyberta stated "representable as UTF-8" and as far as I know, all EBCDIC
code pages support an isomorphic translation to Unicode.

That being said, there is no guarantee that a filename is valid EBCDIC.
If EBCDIC code pages exist that don't assign characters for all code
point values (I don't know if any such code pages exist; I would guess
not), then use of such code points in a filename would not specify a
character.

The more complicated scenario is where filenames have no associated
encoding and must be interpreted according to the current locale and the
locale specifies a non-trivial encoding (like UTF-8 or Shift-JIS) where
decoding errors are possible. In this case, the only solution for
accurate filename transmission is a binary encoding.

> P1689 ought to be a taker and support filenames which are invalid UTF.
> I'm particularly thinking of temporary file names which build systems
> often deal with, as some temporary file name generators make invalid
> UTF. And that's totally legal on the filesystem.

+1

Tom.

Received on 2019-09-05 17:34:38