C++ Logo

SG16

Advanced search

Subject: Re: [SG16-Unicode] [isocpp-core] What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?
From: Thiago Macieira (thiago_at_[hidden])
Date: 2019-08-13 11:25:57


On Monday, 12 August 2019 19:15:14 PDT Tom Honermann wrote:
> On 8/12/19 4:24 PM, Thiago Macieira wrote:
> > This has broken down in recent decades because Clang and GCC do a
> > pass-through from the source charset to the narrow execution charset. So
> > you can't get the same for non-ASCII. The following source if encoded in
> > Latin1:
> >
> > char str[] = "é";
> >
> > will not behave properly under UTF-8 execution charset at runtime. I don't
> > know if -finput-charset=latin1 makes a difference.
>
> Use of -finput-charset=latin1 does suffice for gcc to DTRT.
>
> It is a little disappointing that no warning is issued, even when
> -finput-charset=utf-8 is specified.

Right, but on the other hand that's actually nice, because you can have binary
data in your source code and not get the compiler complaining at you. So long
as you escape NULs, you could probably just dump a binary file in a raw,
narrow-character (byte) string literal.

(if anyone is thinking about that, I don't recommend it. You're going to run
into size limits: ICC at 512kB and MSVC at 256kB. Use something like xxd -i to
generate a brace-delimited array instead)

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products

SG16 list run by sg16-owner@lists.isocpp.org