C++ Logo


Advanced search

Re: [SG16-Unicode] It???s Time to Stop Adding New Features for Non-Unicode Execution Encodings in C++

From: Thiago Macieira <thiago_at_[hidden]>
Date: Tue, 30 Apr 2019 08:17:20 -0700
On Tuesday, 30 April 2019 07:49:50 PDT Tom Honermann wrote:
> Can you elaborate on this? What do you mean by the "kernel assuming
> your userspace is UTF-8"? Do you mean that the filesystem driver will
> attempt to, by default, present file names composed of 16-bit code units
> transcoded to UTF-8 by default? Given that file names do not have an
> explicit encoding, this seems reasonable to me and even necessary to
> avoid name conflicts from otherwise lossy transcoding operations.

That's exactly what I meant. Both VFAT and NTFS store filenames in UTF-16, so
the kernel must translate to and from that to some 8-bit encoding chosen at
mount time so those names can be presented to userspace. Actually, the driver
must translate because the *kernel* VFS layer requires 8-bit filenames anyway.

This means filenames on VFAT and NTFS *do* have an encoding. You cannot use
arbitrary binary file names since those wouldn't convert to UTF-16 and
couldn't be saved. Quite frankly, you shouldn't choose any iocharset=
different from UTF-8, since there could be file names on disk that wouldn't
convert and couldn't be represented.

Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

Received on 2019-04-30 17:24:59