C++ Logo

sg16

Advanced search

Re: [isocpp-lib] Why have we deprecated filesystem::u8path?

From: Niall Douglas <s_sourceforge_at_[hidden]>
Date: Tue, 29 Nov 2022 19:57:32 +0000
On 29/11/2022 04:09, Steve Downey via SG16 wrote:
> If I recall correctly, and it's been a while, so underlying details are
> fuzzy.
>
> Filesystems in general, and in particular NTFS, don't actually maintain
> any unicode invariants. The sequence of octets you get from the file
> system for a path may not be a well formed utf-8 string. The only thing
> that will open the same file is the same set of octets used to open it
> in the first place, and the only way to construct a file path that
> corresponds to an existing file is to use the various path dirwalk
> algorithms without looking to closely. Treating a path as a utf-8 string
> is at best slightly misleading, at worst an attack vector.
>
> But it's been a few years since we had these rounds of discussions in
> SG16, so CC'ing.
>
> I recall Nial having some horror stories also, but I don't have his
> address handy.

Odd seeing as I only posted to lib-ext four days ago.

My memory of why u8path was deprecated was because char8_t was added,
and so it was felt that u8path needed to go away, even though that broke
a bunch of code for no especially good reason in my opinion. I mean,
u8path did what it said on the tin. It wasn't malicious, it didn't
particularly worsen nor damage anything not already broken. It shouldn't
be used in new code, which is what I think WG21 was intending, but
producing compiler diagnostics for all your perfectly fine legacy code I
personally found overkill. Still, ship has sailed.

You're right that filesystem paths can contain almost any sequence of
bits. The only disallowed codepoints are the path separator, and on
POSIX, the null codepoint.

I have highly anti social code which generates a unique file name by
taking the SHA256 hash, scanning it for the two disallowed codepoints
replacing them with XORed values, and then feeding it as a bag of bits
to the filesystem as a filename. ZFS is the only one I've personally
used which gets upset with you doing this, and even it I think is
configurable about whether it complains.

LLFIO is fine with this, but woe betide you if you ever list the
directory with a command line tool or worse, a GUI filesystem explorer.
They generally segfault :). Hence the code being highly anti social.

Niall

Received on 2022-11-29 19:57:33