C++ Logo

sg16

Advanced search

Re: Follow up on SG16 review of P2996R2 (Reflection for C++26)

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 1 May 2024 16:16:09 -0400
On 4/30/24 10:58 AM, Thiago Macieira via SG16 wrote:
> On Monday 29 April 2024 22:00:48 GMT-7 Jens Maurer via SG16 wrote:
>>> older versions of some Microsoft
>>>
>>> libraries, I think including the standard library, were unable to
>>> accommodate encodings that require more than two bytes to encode a
>>> character and those libraries have been statically linked into many
>>> executables that remain in use according to their internal testing.
>> And those libraries are limited to UCS-2 anyway, according to your
>> description.
> That's not necessarily correct. The issue is whether the 8-bit encoding
> supports more than two bytes (code units) or not. I think GB2312 is limited to
> 2 bytes in 8-bit mode and maps to only the Basic Multilingual Plane, but I
> don't know if that is true for GBK and MS code page 936.
>
> But you're probably right in practice: any code that is that old is probably
> going to have trouble with UTF-16 surrogate pairs anyway.
>
> Anyway, the problem is not the W side of the equation. It's that those
> libraries will convert back and forth between W and A and will have trouble
> when a single wchar_t maps to more than two bytes.
>
Yes, Thiago described what I meant very well.

UTF-8 requires three bytes for code points in the range U+0800 through
U+FFFF.

I don't think any other Microsoft supported encoding utilizes more than
two bytes to encode a single character.

Tom.

Received on 2024-05-01 20:16:11