C++ Logo


Advanced search

Subject: Re: A UTF-8 environment specification; an alternative to assuming UTF-8 based on choice of literal encoding
From: Thiago Macieira (thiago_at_[hidden])
Date: 2021-07-29 10:12:03

On Wednesday, 28 July 2021 16:09:32 PDT Charlie Barto via SG16 wrote:
> > int main(int argc, char8_t** args, char8_t** env)
> Yeah I think anything like this should be specified to be WTF-8, even on
> posix making them actual utf-8 would break file path arguments. With WTF-8
> you can round trip to the original sequence of potentially ill formed
> utf-16 code units.

That statement is misleading because it's mixing two things.

You're saying it should be WTF-8 because on Windows, it can be used to hold
improperly-encoded UTF-16 file paths.

And you're saying that because it would be WTF-8 on Windows, it should be
WTF-8 on POSIX systems too.

Both suggestions are fine. I agree with them.

The problem is that the way you wrote, it makes it sound like WTF-8 can be
used to hold invalid file paths on POSIX systems and round-trip those to
UTF-16. That doesn't work. Therefore, any cross-platform content that attempts
to transcode to UTF-16 will have to deal with undecodeable paths any way.

Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel DPG Cloud Engineering

SG16 list run by sg16-owner@lists.isocpp.org