On 5/8/24 2:59 PM, Peter Dimov wrote:

Tom Honermann wrote:

This keeps neglecting the basic fact that there are implementations and
ecosystems that cannot adopt what you are suggesting. Not now, not in the
near term, probably never.

Would you please give one concrete example of such an implementation
or an ecosystem, and how translating Unicode literals to the _ordinary_
literal encoding on stream insertion would be a problem there?

Any EBCDIC based system like z/OS.

C++ code can't distinguish between literals and non-literals (except for UDLs, but that is irrelevant here), but I don't think you intended to constrain the question to Unicode literals.

UTF-8 solves problems with mojibake. It does not solve problems with translations. Let's go back to a variation of an example I gave earlier that uses a hypothetical message catalog similar to GNU gettext() to provide translations of strings in UTF-8 in char8_t.

std::cout << u8msg("In the month of ") << std::chrono::August << "\n";

Say the ordinary literal encoding is IBM-1047. Translation to the ordinary literal encoding will limit the output to characters representable in that encoding; any other characters would presumably be replaced with substitution characters. If the program is run in an IBM-1047 environment, there is no problem. Now run that program in an environment with a Japanese locale using code page 954 (euc-jp). The message catalog lookup would produce a UTF-8 string that probably uses characters not in IBM-1047. Conversion to code page 954 will likely preserve those characters while conversion to IBM-1047 definitely would not.

Tom.