Date: Thu, 2 Jun 2022 20:53:36 -0400
On Thu, Jun 2, 2022 at 4:53 PM Thiago Macieira via SG16 <
sg16_at_[hidden]> wrote:
> On Thursday, 2 June 2022 13:34:34 PDT Zach Laine via SG16 wrote:
> > So then 100% of those people will say "always". Others people not
> > writing those kinds of programs will answer differently, right? I
> > think I'm missing something.
>
> The point is that it's 100% for nearly 100% of the applications, which
> means
> it's not useful data. And this also changes depending on whether the
> developer
> knows that Windows is UTF-16 behind the scenes or not. If they do, then
> virtually 100% of Windows applications deal with at least two encodings,
> one
> of which is not Unicode or US-ASCII.
>
>
> I don't think that's entirely true across domains, and possibly not true
even on Windows. At Bloomberg 95% of programmers are only ever going to
touch text that is encoded in UTF-8 and will never ever deal with C++ or
POSIX locales. This is strikingly different than 10 years ago where people
had to know about encodings and languages in order to process text
correctly, although many ended up doing the right thing by accident by
doing nothing. Returning text to a user that they entered tends to work no
matter the encoding, after all, and most fixed text has been localized
using a label system for more than 20. Our logs would get broken, but not
visible to users. Windows involves, for some systems, transcoding between
UTF-8 and -16, but many programs just stay in wchar_t. Or the translation
is hidden from most application programmers and left to infrastructure.
We can fine tune the question, but I think something in this vicinity is
interesting.
sg16_at_[hidden]> wrote:
> On Thursday, 2 June 2022 13:34:34 PDT Zach Laine via SG16 wrote:
> > So then 100% of those people will say "always". Others people not
> > writing those kinds of programs will answer differently, right? I
> > think I'm missing something.
>
> The point is that it's 100% for nearly 100% of the applications, which
> means
> it's not useful data. And this also changes depending on whether the
> developer
> knows that Windows is UTF-16 behind the scenes or not. If they do, then
> virtually 100% of Windows applications deal with at least two encodings,
> one
> of which is not Unicode or US-ASCII.
>
>
> I don't think that's entirely true across domains, and possibly not true
even on Windows. At Bloomberg 95% of programmers are only ever going to
touch text that is encoded in UTF-8 and will never ever deal with C++ or
POSIX locales. This is strikingly different than 10 years ago where people
had to know about encodings and languages in order to process text
correctly, although many ended up doing the right thing by accident by
doing nothing. Returning text to a user that they entered tends to work no
matter the encoding, after all, and most fixed text has been localized
using a label system for more than 20. Our logs would get broken, but not
visible to users. Windows involves, for some systems, transcoding between
UTF-8 and -16, but many programs just stay in wchar_t. Or the translation
is hidden from most application programmers and left to infrastructure.
We can fine tune the question, but I think something in this vicinity is
interesting.
Received on 2022-06-03 00:53:50