Date: Thu, 02 Jun 2022 18:16:50 -0700
On Thursday, 2 June 2022 17:53:36 PDT Steve Downey wrote:
> I don't think that's entirely true across domains, and possibly not true
> even on Windows. At Bloomberg 95% of programmers are only ever going to
> touch text that is encoded in UTF-8 and will never ever deal with C++ or
> POSIX locales. This is strikingly different than 10 years ago where people
> had to know about encodings and languages in order to process text
> correctly, although many ended up doing the right thing by accident by
> doing nothing. Returning text to a user that they entered tends to work no
> matter the encoding, after all, and most fixed text has been localized
> using a label system for more than 20. Our logs would get broken, but not
> visible to users. Windows involves, for some systems, transcoding between
> UTF-8 and -16, but many programs just stay in wchar_t. Or the translation
> is hidden from most application programmers and left to infrastructure.
>
> We can fine tune the question, but I think something in this vicinity is
> interesting.
I'm not opposing the question. There is something I want to know there too,
but I don't want to usurp Zach's question so I am asking him what he meant.
In your case, for example, if you're dealing with UTF-8 on Windows, which is
the proper answer:
a) I only use one encoding, which is a Unicode one
b) I use two encodings, UTF-8 for my file's contents and UTF-16 because all my
system calls are UTF-16. This may even be three encodings, depending on just
which API you're using to open those files in the first place -- if you've used
fopen(), it's probably CP_ACP.
> I don't think that's entirely true across domains, and possibly not true
> even on Windows. At Bloomberg 95% of programmers are only ever going to
> touch text that is encoded in UTF-8 and will never ever deal with C++ or
> POSIX locales. This is strikingly different than 10 years ago where people
> had to know about encodings and languages in order to process text
> correctly, although many ended up doing the right thing by accident by
> doing nothing. Returning text to a user that they entered tends to work no
> matter the encoding, after all, and most fixed text has been localized
> using a label system for more than 20. Our logs would get broken, but not
> visible to users. Windows involves, for some systems, transcoding between
> UTF-8 and -16, but many programs just stay in wchar_t. Or the translation
> is hidden from most application programmers and left to infrastructure.
>
> We can fine tune the question, but I think something in this vicinity is
> interesting.
I'm not opposing the question. There is something I want to know there too,
but I don't want to usurp Zach's question so I am asking him what he meant.
In your case, for example, if you're dealing with UTF-8 on Windows, which is
the proper answer:
a) I only use one encoding, which is a Unicode one
b) I use two encodings, UTF-8 for my file's contents and UTF-16 because all my
system calls are UTF-16. This may even be three encodings, depending on just
which API you're using to open those files in the first place -- if you've used
fopen(), it's probably CP_ACP.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel DPG Cloud Engineering
Received on 2022-06-03 01:16:52