On Tue, Aug 13, 2019, 6:34 PM Thiago Macieira <thiago@macieira.org> wrote:

On Monday, 12 August 2019 19:15:14 PDT Tom Honermann wrote:
> On 8/12/19 4:24 PM, Thiago Macieira wrote:
> > This has broken down in recent decades because Clang and GCC do a
> > pass-through from the source charset to the narrow execution charset. So
> > you can't get the same for non-ASCII. The following source if encoded in
> > Latin1:
> >
> > char str[] = "é";
> >
> > will not behave properly under UTF-8 execution charset at runtime. I don't
> > know if -finput-charset=latin1 makes a difference.
>
> Use of -finput-charset=latin1 does suffice for gcc to DTRT.
>
> It is a little disappointing that no warning is issued, even when
> -finput-charset=utf-8 is specified.

Right, but on the other hand that's actually nice, because you can have binary
data in your source code and not get the compiler complaining at you. So long
as you escape NULs, you could probably just dump a binary file in a raw,
narrow-character (byte) string literal.

(if anyone is thinking about that, I don't recommend it. You're going to run
into size limits: ICC at 512kB and MSVC at 256kB. Use something like xxd -i to
generate a brace-delimited array instead)

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel System Software Products

_______________________________________________
SG16 Unicode mailing list
Unicode@isocpp.open-std.org
http://www.open-std.org/mailman/listinfo/unicode