Date: Mon, 15 Nov 2021 18:20:14 -0600
I conducted an audit of all of the encoding names recognized by ICU with
the goal of identifying any cases where comparison under the COMP_NAME
loose matching algorithm specified in P1885 <https://wg21.link/p1885>
would lead to a conflict in selecting an ICU converter. The good news is
that no conflicts were identified that can be attributed to the loose
matching algorithm. However, I found that the same alias is used for
different encodings in multiple cases as described in the table below.
These can be verified with ICU Converter Explorer
<https://icu4c-demos.unicode.org/icu-bin/convexp?s=UTR22&s=IBM&s=WINDOWS&s=JAVA&s=IANA&s=MIME&s=-&s=ALL&ShowUnavailable=>.
I did not scrape the ICU Converter Explorer page to perform the audit.
The data I worked off of was produced with ICU 70.1 by running uconv -l
--canon and then massaging the output.
Each row of the table describes a conflict between two ICU encodings,
each of which is named in the left most and right most columns
respectively. The inner columns list the specific aliases that conflict
and which provider they correspond to.
For at least some of these, one has to wonder if the ICU data is simply
incorrect. Cases that only involve a conflict with an untagged alias are
illustrated in gray so that the others stand out.
Can anyone offer an explanation for these conflicts? Do these reflect
defects in ICU (particularly for the cases where the untagged aliases
disagree with)?
*ICU encoding**
* *Encoding alias****(provider)**
* *Encoding alias****(provider)* *ICU encoding**
*
ibm-943_P15A-2003
cp932 (Windows)
cp932 (Untagged)
ibm-942_P12A-1999
ibm-943_P130-1999
ibm-943 (IBM)
ibm-943 (Java) ibm-943 (Untagged)
ibm-943_P15A-2003
ibm-943_P130-1999
Shift_JIS (Untagged)
Shift_JIS (Windows)
Shift_JIS (Java)
Shift_JIS (IANA)
Shift_JIS (MIME)
ibm-943_P15A-2003
ibm-33722_P120-1999
ibm-33722 (IBM)
ibm-33722 (Java) ibm-33722 (Untagged)
ibm-33722_P12A_P12A-2009_U2
ibm-33722_P120-1999
ibm-5050 (IBM)
ibm-5050 (Untagged)
ibm-33722_P12A_P12A-2009_U2
windows-950-2000
windows-950 (Windows)
windows-950 (Untagged)
ibm-1373_P100-2002
ibm-5471_P100-2006
Big5-HKSCS (Untagged)
Big5-HKSCS (Java)
Big5-HKSCS (IANA)
ibm-1375_P100-2008
windows-936-2000
windows-936 (Windows)
windows-936 (Java)
windows-936 (IANA)
windows-936 (Untagged)
ibm-1386_P100-2001
ibm-949_P11A-1999
ibm-949 (Untagged)
ibm-949 (IBM)
ibm-949 (Java)
ibm-949_P110-1999
ibm-1363_P11B-1998
KS_C_5601-1987 (IANA)
KS_C_5601-1987 (Java)
ibm-970_P110_P110-2006_U2
ibm-1363_P11B-1998
KSC_5601 (IANA)
KSC_5601 (Java)
ibm-970_P110_P110-2006_U2
ibm-1363_P11B-1998
5601 (Untagged)
5601 (Java)
ibm-970_P110_P110-2006_U2
ibm-1363_P110-1997
ibm-1363 (IBM)
ibm-1363 (Untagged)
ibm-1363_P11B-1998
windows-949-2000
windows-949 (Windows)
windows-949 (Java)
windows-949 (Untagged)
ibm-1363_P11B-1998
windows-949-2000
KS_C_5601-1987 (Windows)
KS_C_5601-1987 (Java)
ibm-970_P110_P110-2006_U2
windows-949-2000
KS_C_5601-1989 (Windows)
KS_C_5601-1989 (IANA)
ibm-1363_P11B-1998
windows-949-2000
KSC_5601 (Windows)
KSC_5601 (MIME)
KSC_5601 (Java)
ibm-970_P110_P110-2006_U2
windows-949-2000
csKSC56011987 (Windows)
csKSC56011987 (IANA)
ibm-1363_P11B-1998
windows-949-2000
korean (Windows)
korean (IANA)
ibm-1363_P11B-1998
windows-949-2000
iso-ir-149 (Windows)
iso-ir-149 (IANA)
ibm-1363_P11B-1998
ibm-874_P100-1995
TIS-620 (Java)
TIS-620 (IANA)
TIS-620 (Windows)
windows-874-2000
ibm-1250_P100-1995
windows-1250 (Untagged)
windows-1250 (Windows)
windows-1250 (Java)
windows-1250 (IANA)
ibm-5346_P100-1998
ibm-1251_P100-1995
windows-1251 (Untagged)
windows-1251 (Windows)
windows-1251 (Java)
windows-1251 (IANA) ibm-5347_P100-1998
ibm-1252_P100-2000
windows-1252 (Untagged)
windows-1252 (Windows)
windows-1252 (Java)
windows-1252 (IANA) ibm-5348_P100-1997
ibm-1253_P100-1995
windows-1253 (Untagged)
windows-1253 (Windows)
windows-1253 (Java)
windows-1253 (IANA) ibm-5349_P100-1998
ibm-1254_P100-1995
windows-1254 (Untagged)
windows-1254 (Windows)
windows-1254 (Java)
windows-1254 (IANA) ibm-5350_P100-1998
ibm-5351_P100-1998
windows-1255 (Untagged)
windows-1255 (Windows)
windows-1255 (Java)
windows-1255 (IANA) ibm-9447_P100-2002
ibm-5352_P100-1998
windows-1256 (Untagged)
windows-1256 (Windows)
windows-1256 (Java)
windows-1256 (IANA) ibm-9448_X100-2005
ibm-5353_P100-1998
windows-1257 (Untagged)
windows-1257 (Windows)
windows-1257 (Java)
windows-1257 (IANA) ibm-9449_P100-2002
ibm-1258_P100-1997
windows-1258 (Untagged)
windows-1258 (Windows)
windows-1258 (Java)
windows-1258 (IANA) ibm-5354_P100-1998
Tom.
the goal of identifying any cases where comparison under the COMP_NAME
loose matching algorithm specified in P1885 <https://wg21.link/p1885>
would lead to a conflict in selecting an ICU converter. The good news is
that no conflicts were identified that can be attributed to the loose
matching algorithm. However, I found that the same alias is used for
different encodings in multiple cases as described in the table below.
These can be verified with ICU Converter Explorer
<https://icu4c-demos.unicode.org/icu-bin/convexp?s=UTR22&s=IBM&s=WINDOWS&s=JAVA&s=IANA&s=MIME&s=-&s=ALL&ShowUnavailable=>.
I did not scrape the ICU Converter Explorer page to perform the audit.
The data I worked off of was produced with ICU 70.1 by running uconv -l
--canon and then massaging the output.
Each row of the table describes a conflict between two ICU encodings,
each of which is named in the left most and right most columns
respectively. The inner columns list the specific aliases that conflict
and which provider they correspond to.
For at least some of these, one has to wonder if the ICU data is simply
incorrect. Cases that only involve a conflict with an untagged alias are
illustrated in gray so that the others stand out.
Can anyone offer an explanation for these conflicts? Do these reflect
defects in ICU (particularly for the cases where the untagged aliases
disagree with)?
*ICU encoding**
* *Encoding alias****(provider)**
* *Encoding alias****(provider)* *ICU encoding**
*
ibm-943_P15A-2003
cp932 (Windows)
cp932 (Untagged)
ibm-942_P12A-1999
ibm-943_P130-1999
ibm-943 (IBM)
ibm-943 (Java) ibm-943 (Untagged)
ibm-943_P15A-2003
ibm-943_P130-1999
Shift_JIS (Untagged)
Shift_JIS (Windows)
Shift_JIS (Java)
Shift_JIS (IANA)
Shift_JIS (MIME)
ibm-943_P15A-2003
ibm-33722_P120-1999
ibm-33722 (IBM)
ibm-33722 (Java) ibm-33722 (Untagged)
ibm-33722_P12A_P12A-2009_U2
ibm-33722_P120-1999
ibm-5050 (IBM)
ibm-5050 (Untagged)
ibm-33722_P12A_P12A-2009_U2
windows-950-2000
windows-950 (Windows)
windows-950 (Untagged)
ibm-1373_P100-2002
ibm-5471_P100-2006
Big5-HKSCS (Untagged)
Big5-HKSCS (Java)
Big5-HKSCS (IANA)
ibm-1375_P100-2008
windows-936-2000
windows-936 (Windows)
windows-936 (Java)
windows-936 (IANA)
windows-936 (Untagged)
ibm-1386_P100-2001
ibm-949_P11A-1999
ibm-949 (Untagged)
ibm-949 (IBM)
ibm-949 (Java)
ibm-949_P110-1999
ibm-1363_P11B-1998
KS_C_5601-1987 (IANA)
KS_C_5601-1987 (Java)
ibm-970_P110_P110-2006_U2
ibm-1363_P11B-1998
KSC_5601 (IANA)
KSC_5601 (Java)
ibm-970_P110_P110-2006_U2
ibm-1363_P11B-1998
5601 (Untagged)
5601 (Java)
ibm-970_P110_P110-2006_U2
ibm-1363_P110-1997
ibm-1363 (IBM)
ibm-1363 (Untagged)
ibm-1363_P11B-1998
windows-949-2000
windows-949 (Windows)
windows-949 (Java)
windows-949 (Untagged)
ibm-1363_P11B-1998
windows-949-2000
KS_C_5601-1987 (Windows)
KS_C_5601-1987 (Java)
ibm-970_P110_P110-2006_U2
windows-949-2000
KS_C_5601-1989 (Windows)
KS_C_5601-1989 (IANA)
ibm-1363_P11B-1998
windows-949-2000
KSC_5601 (Windows)
KSC_5601 (MIME)
KSC_5601 (Java)
ibm-970_P110_P110-2006_U2
windows-949-2000
csKSC56011987 (Windows)
csKSC56011987 (IANA)
ibm-1363_P11B-1998
windows-949-2000
korean (Windows)
korean (IANA)
ibm-1363_P11B-1998
windows-949-2000
iso-ir-149 (Windows)
iso-ir-149 (IANA)
ibm-1363_P11B-1998
ibm-874_P100-1995
TIS-620 (Java)
TIS-620 (IANA)
TIS-620 (Windows)
windows-874-2000
ibm-1250_P100-1995
windows-1250 (Untagged)
windows-1250 (Windows)
windows-1250 (Java)
windows-1250 (IANA)
ibm-5346_P100-1998
ibm-1251_P100-1995
windows-1251 (Untagged)
windows-1251 (Windows)
windows-1251 (Java)
windows-1251 (IANA) ibm-5347_P100-1998
ibm-1252_P100-2000
windows-1252 (Untagged)
windows-1252 (Windows)
windows-1252 (Java)
windows-1252 (IANA) ibm-5348_P100-1997
ibm-1253_P100-1995
windows-1253 (Untagged)
windows-1253 (Windows)
windows-1253 (Java)
windows-1253 (IANA) ibm-5349_P100-1998
ibm-1254_P100-1995
windows-1254 (Untagged)
windows-1254 (Windows)
windows-1254 (Java)
windows-1254 (IANA) ibm-5350_P100-1998
ibm-5351_P100-1998
windows-1255 (Untagged)
windows-1255 (Windows)
windows-1255 (Java)
windows-1255 (IANA) ibm-9447_P100-2002
ibm-5352_P100-1998
windows-1256 (Untagged)
windows-1256 (Windows)
windows-1256 (Java)
windows-1256 (IANA) ibm-9448_X100-2005
ibm-5353_P100-1998
windows-1257 (Untagged)
windows-1257 (Windows)
windows-1257 (Java)
windows-1257 (IANA) ibm-9449_P100-2002
ibm-1258_P100-1997
windows-1258 (Untagged)
windows-1258 (Windows)
windows-1258 (Java)
windows-1258 (IANA) ibm-5354_P100-1998
Tom.
Received on 2021-11-15 18:20:19