On Monday, 12 August 2019 19:15:14 PDT Tom Honermann wrote:
> On 8/12/19 4:24 PM, Thiago Macieira wrote:
> > This has broken down in recent decades because Clang and GCC do a
> > pass-through from the source charset to the narrow execution charset. So
> > you can't get the same for non-ASCII. The following source if encoded in
> > Latin1:
> >
> > char str[] = "é";
> >
> > will not behave properly under UTF-8 execution charset at runtime. I don't
> > know if -finput-charset=latin1 makes a difference.
>
> Use of -finput-charset=latin1 does suffice for gcc to DTRT.
>
> It is a little disappointing that no warning is issued, even when
> -finput-charset=utf-8 is specified.
Right, but on the other hand that's actually nice, because you can have binary
data in your source code and not get the compiler complaining at you. So long
as you escape NULs, you could probably just dump a binary file in a raw,
narrow-character (byte) string literal.
(if anyone is thinking about that, I don't recommend it. You're going to run
into size limits: ICC at 512kB and MSVC at 256kB. Use something like xxd -i to
generate a brace-delimited array instead)
Afaik that works if you use \x to escape every byte otherwise some implementation will mess with your data. Nothing is guaranteed to be passthrough otherwise