Date: Wed, 8 May 2024 15:25:51 -0400
On 5/8/24 2:59 PM, Peter Dimov wrote:
> Tom Honermann wrote:
>> This keeps neglecting the basic fact that there are implementations and
>> ecosystems that cannot adopt what you are suggesting. Not now, not in the
>> near term, probably never.
> Would you please give one concrete example of such an implementation
> or an ecosystem, and how translating Unicode literals to the _ordinary_
> literal encoding on stream insertion would be a problem there?
Any EBCDIC based system like z/OS.
C++ code can't distinguish between literals and non-literals (except for
UDLs, but that is irrelevant here), but I don't think you intended to
constrain the question to Unicode literals.
UTF-8 solves problems with mojibake. It does not solve problems with
translations. Let's go back to a variation of an example I gave earlier
that uses a hypothetical message catalog similar to GNU gettext() to
provide translations of strings in UTF-8 in char8_t.
std::cout << u8msg("In the month of ") << std::chrono::August << "\n";
Say the ordinary literal encoding is IBM-1047. Translation to the
ordinary literal encoding will limit the output to characters
representable in that encoding; any other characters would presumably be
replaced with substitution characters. If the program is run in an
IBM-1047 environment, there is no problem. Now run that program in an
environment with a Japanese locale using code page 954 (euc-jp). The
message catalog lookup would produce a UTF-8 string that probably uses
characters not in IBM-1047. Conversion to code page 954 will likely
preserve those characters while conversion to IBM-1047 definitely would not.
Tom.
> Tom Honermann wrote:
>> This keeps neglecting the basic fact that there are implementations and
>> ecosystems that cannot adopt what you are suggesting. Not now, not in the
>> near term, probably never.
> Would you please give one concrete example of such an implementation
> or an ecosystem, and how translating Unicode literals to the _ordinary_
> literal encoding on stream insertion would be a problem there?
Any EBCDIC based system like z/OS.
C++ code can't distinguish between literals and non-literals (except for
UDLs, but that is irrelevant here), but I don't think you intended to
constrain the question to Unicode literals.
UTF-8 solves problems with mojibake. It does not solve problems with
translations. Let's go back to a variation of an example I gave earlier
that uses a hypothetical message catalog similar to GNU gettext() to
provide translations of strings in UTF-8 in char8_t.
std::cout << u8msg("In the month of ") << std::chrono::August << "\n";
Say the ordinary literal encoding is IBM-1047. Translation to the
ordinary literal encoding will limit the output to characters
representable in that encoding; any other characters would presumably be
replaced with substitution characters. If the program is run in an
IBM-1047 environment, there is no problem. Now run that program in an
environment with a Japanese locale using code page 954 (euc-jp). The
message catalog lookup would produce a UTF-8 string that probably uses
characters not in IBM-1047. Conversion to code page 954 will likely
preserve those characters while conversion to IBM-1047 definitely would not.
Tom.
Received on 2024-05-08 19:25:53