Date: Thu, 5 Sep 2019 11:34:33 -0400
On 9/5/19 8:57 AM, Niall Douglas wrote:
> On 05/09/2019 12:11, Lyberta wrote:
>> Do we really expect C++20 build systems to run on filesystem paths not
>> representable as UTF-8? Users who do that really shoot themselves in the
>> foot.
> As Tom likes to say, EBCDIC.
Lyberta stated "representable as UTF-8" and as far as I know, all EBCDIC
code pages support an isomorphic translation to Unicode.
That being said, there is no guarantee that a filename is valid EBCDIC.
If EBCDIC code pages exist that don't assign characters for all code
point values (I don't know if any such code pages exist; I would guess
not), then use of such code points in a filename would not specify a
character.
The more complicated scenario is where filenames have no associated
encoding and must be interpreted according to the current locale and the
locale specifies a non-trivial encoding (like UTF-8 or Shift-JIS) where
decoding errors are possible. In this case, the only solution for
accurate filename transmission is a binary encoding.
> P1689 ought to be a taker and support filenames which are invalid UTF.
> I'm particularly thinking of temporary file names which build systems
> often deal with, as some temporary file name generators make invalid
> UTF. And that's totally legal on the filesystem.
+1
Tom.
> On 05/09/2019 12:11, Lyberta wrote:
>> Do we really expect C++20 build systems to run on filesystem paths not
>> representable as UTF-8? Users who do that really shoot themselves in the
>> foot.
> As Tom likes to say, EBCDIC.
Lyberta stated "representable as UTF-8" and as far as I know, all EBCDIC
code pages support an isomorphic translation to Unicode.
That being said, there is no guarantee that a filename is valid EBCDIC.
If EBCDIC code pages exist that don't assign characters for all code
point values (I don't know if any such code pages exist; I would guess
not), then use of such code points in a filename would not specify a
character.
The more complicated scenario is where filenames have no associated
encoding and must be interpreted according to the current locale and the
locale specifies a non-trivial encoding (like UTF-8 or Shift-JIS) where
decoding errors are possible. In this case, the only solution for
accurate filename transmission is a binary encoding.
> P1689 ought to be a taker and support filenames which are invalid UTF.
> I'm particularly thinking of temporary file names which build systems
> often deal with, as some temporary file name generators make invalid
> UTF. And that's totally legal on the filesystem.
+1
Tom.
Received on 2019-09-05 17:34:38