C++ Logo


Advanced search

Subject: Re: [SG16-Unicode] [wg14/wg21 liaison] [isocpp-core] Source file encoding (was: What is the proper term for the locale dependent run-time character set/encoding used for the character classification and conversion functions?)
From: Steve Downey (sdowney_at_[hidden])
Date: 2019-08-14 20:06:58

On Wed, Aug 14, 2019 at 8:54 PM Ed Catmur via Liaison <
liaison_at_[hidden]> wrote:

> Note that the compiler already necessarily knows the source file encoding
> and the execution encoding, to be able to perform the various
> [lex.phases].
> Would it be enough or at least help to expose those, or at least the
> latter?
> The compiler makes assumptions about the source file encoding and
execution encoding. From a standard perspective, it depends on locale, in
some unspecified way. That is, the values of characters in the "execution
character set" depend on locale. Execution encoding isn't actually a term
in the standard, although it's implied.

If the compiler assumes a single byte encoding like Latin-1 it can't tell
that the intended encoding is UTF-8. This happens all the time, and
sometimes actually appears to work when the string literals are eventually
interpreted as UTF-8 instead of Latin-1. Other times, mojibake happens.

SG16 list run by sg16-owner@lists.isocpp.org