C++ Logo

sg16

Advanced search

Re: [SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?

From: Corentin <corentin.jabot_at_[hidden]>
Date: Tue, 13 Aug 2019 18:55:07 +0200
On Tue, Aug 13, 2019, 6:34 PM Thiago Macieira <thiago_at_[hidden]> wrote:

> On Monday, 12 August 2019 19:15:14 PDT Tom Honermann wrote:
> > On 8/12/19 4:24 PM, Thiago Macieira wrote:
> > > This has broken down in recent decades because Clang and GCC do a
> > > pass-through from the source charset to the narrow execution charset.
> So
> > > you can't get the same for non-ASCII. The following source if encoded
> in
> > > Latin1:
> > >
> > > char str[] = "é";
> > >
> > > will not behave properly under UTF-8 execution charset at runtime. I
> don't
> > > know if -finput-charset=latin1 makes a difference.
> >
> > Use of -finput-charset=latin1 does suffice for gcc to DTRT.
> >
> > It is a little disappointing that no warning is issued, even when
> > -finput-charset=utf-8 is specified.
>
> Right, but on the other hand that's actually nice, because you can have
> binary
> data in your source code and not get the compiler complaining at you. So
> long
> as you escape NULs, you could probably just dump a binary file in a raw,
> narrow-character (byte) string literal.
>
> (if anyone is thinking about that, I don't recommend it. You're going to
> run
> into size limits: ICC at 512kB and MSVC at 256kB. Use something like xxd
> -i to
> generate a brace-delimited array instead)
>

Afaik that works if you use \x to escape every byte otherwise some
implementation will mess with your data. Nothing is guaranteed to be
passthrough otherwise

>
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Software Architect - Intel System Software Products
>
>
>
> _______________________________________________
> SG16 Unicode mailing list
> Unicode_at_[hidden]
> http://www.open-std.org/mailman/listinfo/unicode
>

Received on 2019-08-13 18:55:22