Date: Tue, 30 Apr 2024 07:58:19 -0700
On Monday 29 April 2024 22:00:48 GMT-7 Jens Maurer via SG16 wrote:
> > older versions of some Microsoft
> >
> > libraries, I think including the standard library, were unable to
> > accommodate encodings that require more than two bytes to encode a
> > character and those libraries have been statically linked into many
> > executables that remain in use according to their internal testing.
>
> And those libraries are limited to UCS-2 anyway, according to your
> description.
That's not necessarily correct. The issue is whether the 8-bit encoding
supports more than two bytes (code units) or not. I think GB2312 is limited to
2 bytes in 8-bit mode and maps to only the Basic Multilingual Plane, but I
don't know if that is true for GBK and MS code page 936.
But you're probably right in practice: any code that is that old is probably
going to have trouble with UTF-16 surrogate pairs anyway.
Anyway, the problem is not the W side of the equation. It's that those
libraries will convert back and forth between W and A and will have trouble
when a single wchar_t maps to more than two bytes.
> > older versions of some Microsoft
> >
> > libraries, I think including the standard library, were unable to
> > accommodate encodings that require more than two bytes to encode a
> > character and those libraries have been statically linked into many
> > executables that remain in use according to their internal testing.
>
> And those libraries are limited to UCS-2 anyway, according to your
> description.
That's not necessarily correct. The issue is whether the 8-bit encoding
supports more than two bytes (code units) or not. I think GB2312 is limited to
2 bytes in 8-bit mode and maps to only the Basic Multilingual Plane, but I
don't know if that is true for GBK and MS code page 936.
But you're probably right in practice: any code that is that old is probably
going to have trouble with UTF-16 surrogate pairs anyway.
Anyway, the problem is not the W side of the equation. It's that those
libraries will convert back and forth between W and A and will have trouble
when a single wchar_t maps to more than two bytes.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel DCAI Cloud Engineering
Received on 2024-04-30 14:58:21