C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] P1689: Encoding of filenames for interchange

From: Tom Honermann <tom_at_[hidden]>
Date: Thu, 5 Sep 2019 11:27:33 -0400
On 9/5/19 6:51 AM, Niall Douglas wrote:
> Firstly, NUL is a valid filesystem path codepoint on some platforms, and
> I'd like to get the standard fixed on that incorrectness in the near
> future. I think that we can reasonably declare the native path separator
> codepoint the only invalid filesystem path codepoint, as otherwise
> filesystem::path doesn't work.
Please don't try and fix this. I don't believe there is any use case
for support of NUL characters within a path component and, clearly, C,
C++, POSIX, and Win32 APIs have *never* supported this and existing
interfaces obviously cannot be updated to accommodate embedded NUL
characters. Supporting this effectively breaks all code in existence
that deals with file names with no motivation.
>
> Secondly, as I've often told you Thiago, the native Windows filesystem
> API is also byte based. struct UNICODE takes a *byte* length, not a
> wchar_t length. I'll agree that the Win32 path translation layer
> complicates that, but underneath it's all byte based, and I would like
> to hope that whatever modern i/o proposal WG21 chooses will expose
> reality on Windows.

I strongly disagree with the view point that NTFS is a byte based
filesystem. The fact that part of a filesystem neutral interface is
weirdly designed (perhaps because it supports different filesystems,
some of which might actually be byte based) does not mean that NTFS
doesn't store 16-bit code units.

Tom.

Received on 2019-09-05 17:27:37