C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] P1689: Encoding of filenames for interchange

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Sun, 8 Sep 2019 23:12:28 +0100
> If I get a chance over the weekend, I'll quickly write you a LLFIO
> program testing which characters in filenames Windows accepts.

Firstly, you can find said test program at
https://github.com/ned14/llfio/tree/develop/programs/illegal-codepoints.
On Windows, this operates in the NT kernel realm with POSIX semantics,
so no Win32 translation layer in the way.

The results are as follows:

Linux accepts any filename containing anything except /. NUL early
truncates the filename used.

The NT kernel object lookup accepts anything except \. NUL works fine.

NTFS specifically, as of Windows 10.1903, bans the following codepoint
values:

0-31 " * / < > ? \ |

Colons in the filename are accepted, but cause early truncation of the
filename used.

If case insensitive filename matching is on (the default is yes, forced
by a system-wide registry flag), much travesty is done to the codepoint
space. Obvious stuff like Roman 'a' and 'A' are considered identical,
but so is 'a' and 'Á' and 'á' and many more codepoints. I saw several
hundred codepoints out of the 65,536 space considered identical. I guess
at least we know Microsoft have implemented Unicode correctly, for some
definition of correct.

If case insensitive filename matching is off (the system-wide registry
flag is off, or the directory has the insensitive setting enabled), then
everything works via memcmp(), and all is good.

All in all, I'd say you were more correct than I was Tom: NTFS imposes
considerably stricter requirements on acceptable filename codepoints
than my memory recalled. Apologies for the noise.

I would also say that much has changed in Windows in recent years. Back
when I started LLFIO, if you didn't specify OBJ_CASE_INSENSITIVE to the
NT kernel, it gave you case sensitive lookup. Sometime in the years
since then Microsoft have implemented a global override, that flag or
the lack of it is now ignored in a default Windows 10 install. That's new.

Niall

Received on 2019-09-09 01:12:32