Date: Thu, 15 Aug 2019 11:54:00 +0000
There is so much discussion and misunderstandings about C++ charsets in
the adjacent thread and on the Internet. Maybe we can simplify this a bit.
I propose we add an "Intermediate Character Set" and define it as
implementation-defined Unicode encoding form.
Then we add rules like these:
When compiling TU, a text in source charset gets converted to
intermediate charset before preprocessor. This eliminates any ambiguity
about string literals and comments.
Pretty much all text operations during compilation work in terms of
intermediate charset.
As the last step before writing an object file text data gets converted
to various "execution" encodings.
This will allow us to write standardese in the framework of Unicode but
still allows exotic charsets as input and output.
the adjacent thread and on the Internet. Maybe we can simplify this a bit.
I propose we add an "Intermediate Character Set" and define it as
implementation-defined Unicode encoding form.
Then we add rules like these:
When compiling TU, a text in source charset gets converted to
intermediate charset before preprocessor. This eliminates any ambiguity
about string literals and comments.
Pretty much all text operations during compilation work in terms of
intermediate charset.
As the last step before writing an object file text data gets converted
to various "execution" encodings.
This will allow us to write standardese in the framework of Unicode but
still allows exotic charsets as input and output.
Received on 2019-08-15 13:54:59
