On Thu, Jun 2, 2022 at 4:53 PM Thiago Macieira via SG16 <sg16@lists.isocpp.org> wrote:
On Thursday, 2 June 2022 13:34:34 PDT Zach Laine via SG16 wrote:
> So then 100% of those people will say "always".  Others people not
> writing those kinds of programs will answer differently, right?  I
> think I'm missing something.

The point is that it's 100% for nearly 100% of the applications, which means
it's not useful data. And this also changes depending on whether the developer
knows that Windows is UTF-16 behind the scenes or not. If they do, then
virtually 100% of Windows applications deal with at least two encodings, one
of which is not Unicode or US-ASCII.


I don't think that's entirely true across domains, and possibly not true even on Windows. At Bloomberg 95% of programmers are only ever going to touch text that is encoded in UTF-8 and will never ever deal with C++ or POSIX locales. This is strikingly different than 10 years ago where people had to know about encodings and languages in order to process text correctly, although many ended up doing the right thing by accident by doing nothing. Returning text to a user that they entered tends to work no matter the encoding, after all, and most fixed text has been localized using a label system for more than 20. Our logs would get broken, but not visible to users. Windows involves, for some systems, transcoding between UTF-8 and -16, but many programs just stay in wchar_t. Or the translation is hidden from most application programmers and left to infrastructure. 

We can fine tune the question, but I think something in this vicinity is interesting.