sg16: Re: [SG16-Unicode] [wg14/wg21 liaison] [isocpp-core] Source file encoding (was: What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?)

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 14 Aug 2019 21:06:58 -0400

On Wed, Aug 14, 2019 at 8:54 PM Ed Catmur via Liaison <
liaison_at_[hidden]> wrote:

>
>
>
> Note that the compiler already necessarily knows the source file encoding
> and the execution encoding, to be able to perform the various
> [lex.phases].
> Would it be enough or at least help to expose those, or at least the
> latter?
>
>
> The compiler makes assumptions about the source file encoding and
execution encoding. From a standard perspective, it depends on locale, in
some unspecified way. That is, the values of characters in the "execution
character set" depend on locale. Execution encoding isn't actually a term
in the standard, although it's implied.

If the compiler assumes a single byte encoding like Latin-1 it can't tell
that the intended encoding is UTF-8. This happens all the time, and
sometimes actually appears to work when the string literals are eventually
interpreted as UTF-8 instead of Latin-1. Other times, mojibake happens.

Received on 2019-08-15 03:07:12