That's what the standard refers to now as the internal encoding.

An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (e.g., using the \ uXXXX notation), are handled equivalently except where this replacement is reverted ([lex.pptoken]) in a raw string literal.

Now, this doesn't quite require that the internal encoding be Unicode. If I'm reading it correctly, it could be lossy. However, given the other requirements around u literals, it's somewhat unlikely. It might be worth exploring making it an explicit requirement that the internal encoding be some unspecified unicode transform, so even if it's utf-ebcdic, that's ok.

All of this language in the standard seems to have been drafted between 94 and 98, and doesn't correspond well to current nomenclature around character encodings. It also comes from a time when it wasn't clear that programs would routinely have to deal with multiple encodings at the same time during their lifetime, and that one of the most common would be a multibyte encoding. 

On Thu, Aug 15, 2019, 07:55 Lyberta <> wrote:
There is so much discussion and misunderstandings about C++ charsets in
the adjacent thread and on the Internet. Maybe we can simplify this a bit.

I propose we add an "Intermediate Character Set" and define it as
implementation-defined Unicode encoding form.

Then we add rules like these:

When compiling TU, a text in source charset gets converted to
intermediate charset before preprocessor. This eliminates any ambiguity
about string literals and comments.

Pretty much all text operations during compilation work in terms of
intermediate charset.

As the last step before writing an object file text data gets converted
to various "execution" encodings.

This will allow us to write standardese in the framework of Unicode but
still allows exotic charsets as input and output.

SG16 Unicode mailing list