Date: Wed, 14 Aug 2019 21:06:58 -0400
On Wed, Aug 14, 2019 at 8:54 PM Ed Catmur via Liaison <
liaison_at_[hidden]> wrote:
>
>
>
> Note that the compiler already necessarily knows the source file encoding
> and the execution encoding, to be able to perform the various
> [lex.phases].
> Would it be enough or at least help to expose those, or at least the
> latter?
>
>
> The compiler makes assumptions about the source file encoding and
execution encoding. From a standard perspective, it depends on locale, in
some unspecified way. That is, the values of characters in the "execution
character set" depend on locale. Execution encoding isn't actually a term
in the standard, although it's implied.
If the compiler assumes a single byte encoding like Latin-1 it can't tell
that the intended encoding is UTF-8. This happens all the time, and
sometimes actually appears to work when the string literals are eventually
interpreted as UTF-8 instead of Latin-1. Other times, mojibake happens.
liaison_at_[hidden]> wrote:
>
>
>
> Note that the compiler already necessarily knows the source file encoding
> and the execution encoding, to be able to perform the various
> [lex.phases].
> Would it be enough or at least help to expose those, or at least the
> latter?
>
>
> The compiler makes assumptions about the source file encoding and
execution encoding. From a standard perspective, it depends on locale, in
some unspecified way. That is, the values of characters in the "execution
character set" depend on locale. Execution encoding isn't actually a term
in the standard, although it's implied.
If the compiler assumes a single byte encoding like Latin-1 it can't tell
that the intended encoding is UTF-8. This happens all the time, and
sometimes actually appears to work when the string literals are eventually
interpreted as UTF-8 instead of Latin-1. Other times, mojibake happens.
Received on 2019-08-14 20:09:11