ISOCPP sg16 List: Re: [isocpp-sg16] Follow up on SG16 review of P2996R2 (Reflection for C++26)

From: Tiago Freire <tmiguelf_at_[hidden]>
Date: Thu, 9 May 2024 07:34:32 +0000

> Use of a UTF encoding as the intermediate encoding enables transformation and operations on all inputs without having to track an associated encoding throughout the program.

Why? Why would tracking be a problem?
My IO has a specific encoding, If I know what that is I can make it work. My internal string handling mechanism has a specific encoding, I know what it is I can make that work.
I can transcode between the two, and in the situation that it doesn’t work I can decide what is the best course of action for whatever it is that I’m doing.
Why do we have to solve “what happens if the conversion doesn’t happen smoothly”?
Sometimes it is not possible to transform it smoothly, sure let’s just accept that as a fact of reality , but the problem is the middleman, it can’t know what we want to do and as a consequence doesn’t know what is an acceptable alternative for the user. We are trying to do too much.

> Historically, locale and encoding have been inseparable and continue to be intertwined on almost all operating systems (I think macOS might be the only exception? Perhaps one or more of the BSDs?). The reason for this discussion is because iostreams consults a locale by default and produces text in the locale encoding.

The reason why formatting is related to encoding is just a practicality on how things have been implemented. When you format text, that text needs to be output on some encoding (it is not actually necessary to be this way in many cases).
But as it has been exemplified, the encoding that comes out of locale doesn’t match the encoding of the terminal hence the mojibake.
What was forgotten is that there needs to be a transcoding in the middle between the string that comes out after the parameters had gone trough the “locale transformation” and the output to the terminal itself.
You have data A, you transform it trough locale that produces data in encoding B, but your IO expects encoding C, and we don’t transcode it… well is it any surprise that it doesn’t do the right thing?

But that has nothing to do with why you couldn’t make std::cout << u8"string" work, but everything to do with the fact that the “locale” system is a broken tool that is bad at doing internationalization, which std::cout doesn’t have to use but does it anyway.

And FYI. I’m not advocating changing the behavior of iostream, it has too many barnacles IMO, it should be abandoned.

Received on 2024-05-09 07:34:37