Date: Mon, 29 Apr 2024 19:03:03 -0400
On 4/29/24 6:38 PM, Peter Dimov via SG16 wrote:
> Corentin Jabot wrote:
>> 4, and 5 are inventing a new "portable" encoding.
>> One that is portable to a couple functions of C++ but not for example a python
>> binding generator. Or any other library in C++ or in another language, nor any
>> tool.
>> While it does solve the problem of technically not producing mojibake, it also
>> fails to solve the problem in a meaningful, user-friendly way.
>>
>> Overwhelmingly identifiers will be basic characters. When they aren't,
>> whoever gets an error in python or javascript that \ is not a valid identifier will
>> have a surprise.
>> Moreover, I'm extremely opposed to producing runtime strings that mirror
>> lexing elements; it is just too confusing to teach to people.
> But they (4 and 5) do have the advantage of being no-ops when the literal
> encoding is UTF-8, so we get what we want - perfectly legal char[] strings in
> UTF-8, everything works.
I wrote option 4 incorrectly. What I intended was:
4. Specify a way to encode non-representable characters _outside the
basic literal character set _in the names returned by the above string()
and wstring() member functions and specify that data_member_spec()
accepts such encoded names.
Options 4 and 5 are not necessarily no-ops; that depends on other
implementation details. But they could be engineered to enable no-ops.
Option 3 also provides the desired behavior for those using UTF-8 as the
ordinary literal encoding. It makes a trade off between guaranteed
portable code that can handle all possible identifiers and the
introduction of the new encoding provided by options 4 and 5.
Tom.
> Corentin Jabot wrote:
>> 4, and 5 are inventing a new "portable" encoding.
>> One that is portable to a couple functions of C++ but not for example a python
>> binding generator. Or any other library in C++ or in another language, nor any
>> tool.
>> While it does solve the problem of technically not producing mojibake, it also
>> fails to solve the problem in a meaningful, user-friendly way.
>>
>> Overwhelmingly identifiers will be basic characters. When they aren't,
>> whoever gets an error in python or javascript that \ is not a valid identifier will
>> have a surprise.
>> Moreover, I'm extremely opposed to producing runtime strings that mirror
>> lexing elements; it is just too confusing to teach to people.
> But they (4 and 5) do have the advantage of being no-ops when the literal
> encoding is UTF-8, so we get what we want - perfectly legal char[] strings in
> UTF-8, everything works.
I wrote option 4 incorrectly. What I intended was:
4. Specify a way to encode non-representable characters _outside the
basic literal character set _in the names returned by the above string()
and wstring() member functions and specify that data_member_spec()
accepts such encoded names.
Options 4 and 5 are not necessarily no-ops; that depends on other
implementation details. But they could be engineered to enable no-ops.
Option 3 also provides the desired behavior for those using UTF-8 as the
ordinary literal encoding. It makes a trade off between guaranteed
portable code that can handle all possible identifiers and the
introduction of the new encoding provided by options 4 and 5.
Tom.
Received on 2024-04-29 23:03:10