Corentin Jabot wrote:4, and 5 are inventing a new "portable" encoding. One that is portable to a couple functions of C++ but not for example a python binding generator. Or any other library in C++ or in another language, nor any tool. While it does solve the problem of technically not producing mojibake, it also fails to solve the problem in a meaningful, user-friendly way. Overwhelmingly identifiers will be basic characters. When they aren't, whoever gets an error in python or javascript that \ is not a valid identifier will have a surprise. Moreover, I'm extremely opposed to producing runtime strings that mirror lexing elements; it is just too confusing to teach to people.But they (4 and 5) do have the advantage of being no-ops when the literal encoding is UTF-8, so we get what we want - perfectly legal char[] strings in UTF-8, everything works.
I wrote option 4 incorrectly. What I intended was:
4. Specify a way to encode non-representable
characters outside the
basic literal character set in the names returned
by the above string() and wstring() member functions and specify
that data_member_spec() accepts
such encoded names.
Options 4 and 5 are not necessarily no-ops; that depends on other implementation details. But they could be engineered to enable no-ops.
Option 3 also provides the desired behavior for those using UTF-8
as the ordinary literal encoding. It makes a trade off between
guaranteed portable code that can handle all possible identifiers
and the introduction of the new encoding provided by options 4 and
5.
Tom.