On Wed, Aug 14, 2019 at 8:54 PM Ed Catmur via Liaison <liaison@lists.isocpp.org> wrote:



Note that the compiler already necessarily knows the source file encoding
and the execution encoding, to be able to perform the various [lex.phases].
Would it be enough or at least help to expose those, or at least the latter?


The compiler makes assumptions about the source file encoding and execution encoding. From a standard perspective, it depends on locale, in some unspecified way. That is, the values of characters in the "execution character set" depend on locale. Execution encoding isn't actually a term in the standard, although it's implied. 

If the compiler assumes a single byte encoding like Latin-1 it can't tell that the intended encoding is UTF-8. This happens all the time, and sometimes actually appears to work when the string literals are eventually interpreted as UTF-8 instead of Latin-1. Other times, mojibake happens.