Subject: Re: [SG16-Unicode] It???s Time to Stop Adding New Features for Non-Unicode Execution Encodings in C++
From: Thiago Macieira (thiago_at_[hidden])
Date: 2019-04-30 10:17:20
On Tuesday, 30 April 2019 07:49:50 PDT Tom Honermann wrote:
> Can you elaborate on this? What do you mean by the "kernel assuming
> your userspace is UTF-8"? Do you mean that the filesystem driver will
> attempt to, by default, present file names composed of 16-bit code units
> transcoded to UTF-8 by default? Given that file names do not have an
> explicit encoding, this seems reasonable to me and even necessary to
> avoid name conflicts from otherwise lossy transcoding operations.
That's exactly what I meant. Both VFAT and NTFS store filenames in UTF-16, so
the kernel must translate to and from that to some 8-bit encoding chosen at
mount time so those names can be presented to userspace. Actually, the driver
must translate because the *kernel* VFS layer requires 8-bit filenames anyway.
This means filenames on VFAT and NTFS *do* have an encoding. You cannot use
arbitrary binary file names since those wouldn't convert to UTF-16 and
couldn't be saved. Quite frankly, you shouldn't choose any iocharset=
different from UTF-8, since there could be file names on disk that wouldn't
convert and couldn't be represented.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel System Software Products
SG16 list run by email@example.com