sg16: Re: [SG16-Unicode] [wg14/wg21 liaison] [isocpp-core] Source file encoding (was: What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?)

From: Corentin <corentin.jabot_at_[hidden]>
Date: Wed, 14 Aug 2019 19:55:41 +0200

On Wed, Aug 14, 2019, 7:36 PM Niall Douglas <s_sourceforge_at_[hidden]>
wrote:

> > The present implementation-defined interpretation of the byte sequence
> in
> > source files allows a default of "UTF-8 in strings, comments can use
> > arbitrary bytes" (which thus allows existing source files in a range of
> > ASCII-compatible 8-bit character sets if the non-ASCII characters only
> > appear in comments, without needing to tell the compiler which character
> > set is being used). That approach (which is what GCC does by default)
> > seems more friendly to users with existing source files using various
> > character sets in comments than strictly requiring everything to be
> UTF-8
> > (even in comments) unless the compiler is explicitly told otherwise.
>
> I would find that choice unhelpful for tooling which processes C++
> source code. e.g. Python, which insists that text you feed it is either
> correct, or not text. And that's not unreasonable, either text is
> encoded correctly, or it is not.
>
> What do you think of my "all 7-bit clean ASCII" proposal? #pragma
> encoding (if supported by your C compiler) to opt out.
>

That seems like a step backwards. It's basically what people have had to do
for the past 40 years.

As always here the issue is we lack data about what people actually put in
their strings :(

>
> Niall
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>

Received on 2019-08-14 19:55:55