C++ Logo

sg16

Advanced search

Re: [isocpp-sg16] Use cases for user construction of text_encoding by name

From: Henri Sivonen <hsivonen_at_[hidden]>
Date: Sun, 21 Jul 2024 21:39:54 +0300
On Sun, Jul 21, 2024, at 6:38 PM, Thiago Macieira via SG16 wrote:
> In any case, the environment encoding is probably only going to have two
> answers:
>
> a) the Windows codepage or an identifier from it
> b) UTF-8
>
> The Standard can't rely on this or mandate it, but it's likely going to the
> end result. So the answer to your question should be an identifier that can be
> reconstructed properly on Windows with their API and with ICU.
>
> Is there such an 1:1 mapping?

I believe not: Windows code pages 950 (Traditional Chinese) and 949 (Korean) don't appear to have IANA registrations. They differ from Big5 and EUC-KR in a way analogous to how windows-1252 differs from ISO-8859-1, how windows-31j differs from Shift_JIS, and how GBK differs from GB2312.

It looks like it’s not great that an API that’s supposed to be able to identify execution encodings can’t do so on the one mainstream platform where the answer isn’t consistently UTF-8.

(In contrast to 949 and 950, note that https://www.iana.org/assignments/charset-reg/GBK designates windows-936 as an alias and https://www.iana.org/assignments/charset-reg/windows-31J links to http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT .)

-- 
Henri Sivonen

Received on 2024-07-21 18:40:18