C++ Logo


Advanced search

Re: [SG16] A UTF-8 environment specification; an alternative to assuming UTF-8 based on choice of literal encoding

From: Thiago Macieira <thiago_at_[hidden]>
Date: Thu, 29 Jul 2021 08:12:03 -0700
On Wednesday, 28 July 2021 16:09:32 PDT Charlie Barto via SG16 wrote:
> > int main(int argc, char8_t** args, char8_t** env)
> Yeah I think anything like this should be specified to be WTF-8, even on
> posix making them actual utf-8 would break file path arguments. With WTF-8
> you can round trip to the original sequence of potentially ill formed
> utf-16 code units.

That statement is misleading because it's mixing two things.

You're saying it should be WTF-8 because on Windows, it can be used to hold
improperly-encoded UTF-16 file paths.

And you're saying that because it would be WTF-8 on Windows, it should be
WTF-8 on POSIX systems too.

Both suggestions are fine. I agree with them.

The problem is that the way you wrote, it makes it sound like WTF-8 can be
used to hold invalid file paths on POSIX systems and round-trip those to
UTF-16. That doesn't work. Therefore, any cross-platform content that attempts
to transcode to UTF-16 will have to deal with undecodeable paths any way.

Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel DPG Cloud Engineering

Received on 2021-07-29 10:12:08