Subject: Re: [SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?
From: Corentin (corentin.jabot_at_[hidden])
Date: 2019-08-13 11:55:07
On Tue, Aug 13, 2019, 6:34 PM Thiago Macieira <thiago_at_[hidden]> wrote:
> On Monday, 12 August 2019 19:15:14 PDT Tom Honermann wrote:
> > On 8/12/19 4:24 PM, Thiago Macieira wrote:
> > > This has broken down in recent decades because Clang and GCC do a
> > > pass-through from the source charset to the narrow execution charset.
> > > you can't get the same for non-ASCII. The following source if encoded
> > > Latin1:
> > >
> > > char str = "Ã©";
> > >
> > > will not behave properly under UTF-8 execution charset at runtime. I
> > > know if -finput-charset=latin1 makes a difference.
> > Use of -finput-charset=latin1 does suffice for gcc to DTRT.
> > It is a little disappointing that no warning is issued, even when
> > -finput-charset=utf-8 is specified.
> Right, but on the other hand that's actually nice, because you can have
> data in your source code and not get the compiler complaining at you. So
> as you escape NULs, you could probably just dump a binary file in a raw,
> narrow-character (byte) string literal.
> (if anyone is thinking about that, I don't recommend it. You're going to
> into size limits: ICC at 512kB and MSVC at 256kB. Use something like xxd
> -i to
> generate a brace-delimited array instead)
Afaik that works if you use \x to escape every byte otherwise some
implementation will mess with your data. Nothing is guaranteed to be
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Software Architect - Intel System Software Products
> SG16 Unicode mailing list
SG16 list run by email@example.com