That's what the standard refers to now as the internal encoding.
An implementation may use any internal
encoding, so long as an actual extended character encountered in the
source file, and the same extended character expressed in the source
file as a universal-character-name
(e.g., using the \
uXXXX notation), are handled equivalently
except where this replacement is reverted ([lex.pptoken]
) in a raw string literal.
Now, this doesn't quite require that the internal encoding be Unicode. If I'm reading it correctly, it could be lossy. However, given the other requirements around u literals, it's somewhat unlikely. It might be worth exploring making it an explicit requirement that the internal encoding be some unspecified unicode transform, so even if it's utf-ebcdic, that's ok.
All of this language in the standard seems to have been drafted between 94 and 98, and doesn't correspond well to current nomenclature around character encodings. It also comes from a time when it wasn't clear that programs would routinely have to deal with multiple encodings at the same time during their lifetime, and that one of the most common would be a multibyte encoding.