C++ Logo


Advanced search

Re: [SG16-Unicode] P1689: Encoding of filenames for interchange

From: Thiago Macieira <thiago_at_[hidden]>
Date: Tue, 10 Sep 2019 17:01:00 -0700
On Sunday, 8 September 2019 15:12:28 PDT Niall Douglas wrote:
> The results are as follows:
> Linux accepts any filename containing anything except /. NUL early
> truncates the filename used.
> The NT kernel object lookup accepts anything except \. NUL works fine.
> NTFS specifically, as of Windows 10.1903, bans the following codepoint
> values:
> 0-31 " * / < > ? \ |
> Colons in the filename are accepted, but cause early truncation of the
> filename used.

Colons in the file name indicate Alternate Data Stream, which is like a sub-
file of a file. Depending on how you write your code, it's not truncation,
it's that you see the main name only.

> If case insensitive filename matching is on (the default is yes, forced
> by a system-wide registry flag), much travesty is done to the codepoint
> space. Obvious stuff like Roman 'a' and 'A' are considered identical,
> but so is 'a' and '' and '' and many more codepoints. I saw several
> hundred codepoints out of the 65,536 space considered identical. I guess
> at least we know Microsoft have implemented Unicode correctly, for some
> definition of correct.

But not normalisation, right?

Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

Received on 2019-09-11 02:01:02