C++ Logo

sg16

Advanced search

[SG16] ICU encoding name alias conflicts

From: Tom Honermann <tom_at_[hidden]>
Date: Mon, 15 Nov 2021 18:20:14 -0600
I conducted an audit of all of the encoding names recognized by ICU with
the goal of identifying any cases where comparison under the COMP_NAME
loose matching algorithm specified in P1885 <https://wg21.link/p1885>
would lead to a conflict in selecting an ICU converter. The good news is
that no conflicts were identified that can be attributed to the loose
matching algorithm. However, I found that the same alias is used for
different encodings in multiple cases as described in the table below.
These can be verified with ICU Converter Explorer
<https://icu4c-demos.unicode.org/icu-bin/convexp?s=UTR22&s=IBM&s=WINDOWS&s=JAVA&s=IANA&s=MIME&s=-&s=ALL&ShowUnavailable=>.

I did not scrape the ICU Converter Explorer page to perform the audit.
The data I worked off of was produced with ICU 70.1 by running uconv -l
--canon and then massaging the output.

Each row of the table describes a conflict between two ICU encodings,
each of which is named in the left most and right most columns
respectively. The inner columns list the specific aliases that conflict
and which provider they correspond to.

For at least some of these, one has to wonder if the ICU data is simply
incorrect. Cases that only involve a conflict with an untagged alias are
illustrated in gray so that the others stand out.

Can anyone offer an explanation for these conflicts? Do these reflect
defects in ICU (particularly for the cases where the untagged aliases
disagree with)?

*ICU encoding**
* *Encoding alias****(provider)**
* *Encoding alias****(provider)* *ICU encoding**
*
ibm-943_P15A-2003
 cp932 (Windows)
 cp932 (Untagged)
 ibm-942_P12A-1999
ibm-943_P130-1999
 ibm-943 (IBM)
ibm-943 (Java) ibm-943 (Untagged)
 ibm-943_P15A-2003
ibm-943_P130-1999
 Shift_JIS (Untagged)
 Shift_JIS (Windows)
Shift_JIS (Java)
Shift_JIS (IANA)
Shift_JIS (MIME)
 ibm-943_P15A-2003
ibm-33722_P120-1999
 ibm-33722 (IBM)
ibm-33722 (Java) ibm-33722 (Untagged)
 ibm-33722_P12A_P12A-2009_U2
ibm-33722_P120-1999
 ibm-5050 (IBM)
 ibm-5050 (Untagged)
 ibm-33722_P12A_P12A-2009_U2
windows-950-2000
 windows-950 (Windows)
 windows-950 (Untagged)
 ibm-1373_P100-2002
ibm-5471_P100-2006
 Big5-HKSCS (Untagged)
 Big5-HKSCS (Java)
Big5-HKSCS (IANA)
 ibm-1375_P100-2008
windows-936-2000
 windows-936 (Windows)
windows-936 (Java)
windows-936 (IANA)
 windows-936 (Untagged)
 ibm-1386_P100-2001
ibm-949_P11A-1999
 ibm-949 (Untagged)
 ibm-949 (IBM)
ibm-949 (Java)
 ibm-949_P110-1999
ibm-1363_P11B-1998
 KS_C_5601-1987 (IANA)
 KS_C_5601-1987 (Java)
 ibm-970_P110_P110-2006_U2
ibm-1363_P11B-1998
 KSC_5601 (IANA)
 KSC_5601 (Java)
 ibm-970_P110_P110-2006_U2
ibm-1363_P11B-1998
 5601 (Untagged)
 5601 (Java)
 ibm-970_P110_P110-2006_U2
ibm-1363_P110-1997
 ibm-1363 (IBM)
 ibm-1363 (Untagged)
 ibm-1363_P11B-1998
windows-949-2000
 windows-949 (Windows)
windows-949 (Java)
 windows-949 (Untagged)
 ibm-1363_P11B-1998
windows-949-2000
 KS_C_5601-1987 (Windows)
 KS_C_5601-1987 (Java)
 ibm-970_P110_P110-2006_U2
windows-949-2000
 KS_C_5601-1989 (Windows)
 KS_C_5601-1989 (IANA)
 ibm-1363_P11B-1998
windows-949-2000
 KSC_5601 (Windows)
KSC_5601 (MIME)
 KSC_5601 (Java)
 ibm-970_P110_P110-2006_U2
windows-949-2000
 csKSC56011987 (Windows)
 csKSC56011987 (IANA)
 ibm-1363_P11B-1998
windows-949-2000
 korean (Windows)
 korean (IANA)
 ibm-1363_P11B-1998
windows-949-2000
 iso-ir-149 (Windows)
 iso-ir-149 (IANA)
 ibm-1363_P11B-1998
ibm-874_P100-1995
 TIS-620 (Java)
TIS-620 (IANA)
 TIS-620 (Windows)
 windows-874-2000
ibm-1250_P100-1995
 windows-1250 (Untagged)
 windows-1250 (Windows)
windows-1250 (Java)
windows-1250 (IANA)
 ibm-5346_P100-1998
ibm-1251_P100-1995
 windows-1251 (Untagged)
 windows-1251 (Windows)
windows-1251 (Java)
windows-1251 (IANA) ibm-5347_P100-1998
ibm-1252_P100-2000
 windows-1252 (Untagged)
 windows-1252 (Windows)
windows-1252 (Java)
windows-1252 (IANA) ibm-5348_P100-1997
ibm-1253_P100-1995
 windows-1253 (Untagged)
 windows-1253 (Windows)
windows-1253 (Java)
windows-1253 (IANA) ibm-5349_P100-1998
ibm-1254_P100-1995
 windows-1254 (Untagged)
 windows-1254 (Windows)
windows-1254 (Java)
windows-1254 (IANA) ibm-5350_P100-1998
ibm-5351_P100-1998
 windows-1255 (Untagged)
 windows-1255 (Windows)
windows-1255 (Java)
windows-1255 (IANA) ibm-9447_P100-2002
ibm-5352_P100-1998
 windows-1256 (Untagged)
 windows-1256 (Windows)
windows-1256 (Java)
windows-1256 (IANA) ibm-9448_X100-2005
ibm-5353_P100-1998
 windows-1257 (Untagged)
 windows-1257 (Windows)
windows-1257 (Java)
windows-1257 (IANA) ibm-9449_P100-2002
ibm-1258_P100-1997
 windows-1258 (Untagged)
 windows-1258 (Windows)
windows-1258 (Java)
windows-1258 (IANA) ibm-5354_P100-1998

Tom.


Received on 2021-11-15 18:20:19