On 4/29/24 6:38 PM, Peter Dimov via SG16 wrote:

Corentin Jabot wrote:

4, and 5 are inventing a new "portable" encoding.
One that is portable to a couple functions of C++ but not for example a python
binding generator. Or any other library in C++ or in another language, nor any
tool.
While it does solve the problem of technically not producing mojibake, it also
fails to solve the problem in a meaningful, user-friendly way.

Overwhelmingly identifiers will be basic characters. When they aren't,
whoever gets an error in python or javascript that \ is not a valid identifier will
have a surprise.
Moreover, I'm extremely opposed to producing runtime strings that mirror
lexing elements; it is just too confusing to teach to people.

But they (4 and 5) do have the advantage of being no-ops when the literal
encoding is UTF-8, so we get what we want - perfectly legal char[] strings in
UTF-8, everything works.

I wrote option 4 incorrectly. What I intended was:

4. Specify a way to encode ~~non-representable~~ characters outside the basic literal character set in the names returned by the above string() and wstring() member functions and specify that data_member_spec() accepts such encoded names.

Options 4 and 5 are not necessarily no-ops; that depends on other implementation details. But they could be engineered to enable no-ops.

Option 3 also provides the desired behavior for those using UTF-8 as the ordinary literal encoding. It makes a trade off between guaranteed portable code that can handle all possible identifiers and the introduction of the new encoding provided by options 4 and 5.

Tom.