I conducted an audit of all of the encoding names recognized by ICU with the goal of identifying any cases where comparison under the COMP_NAME loose matching algorithm specified in P1885 would lead to a conflict in selecting an ICU converter. The good news is that no conflicts were identified that can be attributed to the loose matching algorithm. However, I found that the same alias is used for different encodings in multiple cases as described in the table below. These can be verified with ICU Converter Explorer.
I did not scrape the ICU Converter Explorer page to perform the audit. The data I worked off of was produced with ICU 70.1 by running uconv -l --canon and then massaging the output.
Each row of the table describes a conflict between two ICU
encodings, each of which is named in the left most and right most
columns respectively. The inner columns list the specific aliases
that conflict and which provider they correspond to.
For at least some of these, one has to wonder if the ICU data is simply incorrect. Cases that only involve a conflict with an untagged alias are illustrated in gray so that the others stand out.
Can anyone offer an explanation for these conflicts? Do these
reflect defects in ICU (particularly for the cases where the
untagged aliases disagree with)?
| ICU encoding |
Encoding alias (provider) |
Encoding alias (provider) | ICU encoding |
| ibm-943_P15A-2003 |
cp932 (Windows) |
cp932 (Untagged) |
ibm-942_P12A-1999 |
| ibm-943_P130-1999 |
ibm-943 (IBM) ibm-943 (Java) |
ibm-943 (Untagged) |
ibm-943_P15A-2003 |
| ibm-943_P130-1999 |
Shift_JIS (Untagged) |
Shift_JIS (Windows) Shift_JIS (Java) Shift_JIS (IANA) Shift_JIS (MIME) |
ibm-943_P15A-2003 |
| ibm-33722_P120-1999 |
ibm-33722 (IBM) ibm-33722 (Java) |
ibm-33722 (Untagged) |
ibm-33722_P12A_P12A-2009_U2 |
| ibm-33722_P120-1999 |
ibm-5050 (IBM) |
ibm-5050 (Untagged) |
ibm-33722_P12A_P12A-2009_U2 |
| windows-950-2000 |
windows-950 (Windows) |
windows-950 (Untagged) |
ibm-1373_P100-2002 |
| ibm-5471_P100-2006 |
Big5-HKSCS (Untagged) |
Big5-HKSCS (Java) Big5-HKSCS (IANA) |
ibm-1375_P100-2008 |
| windows-936-2000 |
windows-936 (Windows) windows-936 (Java) windows-936 (IANA) |
windows-936 (Untagged) |
ibm-1386_P100-2001 |
| ibm-949_P11A-1999 |
ibm-949 (Untagged) |
ibm-949 (IBM) ibm-949 (Java) |
ibm-949_P110-1999 |
| ibm-1363_P11B-1998 |
KS_C_5601-1987 (IANA) |
KS_C_5601-1987 (Java) |
ibm-970_P110_P110-2006_U2 |
| ibm-1363_P11B-1998 |
KSC_5601 (IANA) |
KSC_5601 (Java) |
ibm-970_P110_P110-2006_U2 |
| ibm-1363_P11B-1998 |
5601 (Untagged) |
5601 (Java) |
ibm-970_P110_P110-2006_U2 |
| ibm-1363_P110-1997 |
ibm-1363 (IBM) |
ibm-1363 (Untagged) |
ibm-1363_P11B-1998 |
| windows-949-2000 |
windows-949 (Windows) windows-949 (Java) |
windows-949 (Untagged) |
ibm-1363_P11B-1998 |
| windows-949-2000 |
KS_C_5601-1987 (Windows) |
KS_C_5601-1987 (Java) |
ibm-970_P110_P110-2006_U2 |
| windows-949-2000 |
KS_C_5601-1989 (Windows) |
KS_C_5601-1989 (IANA) |
ibm-1363_P11B-1998 |
| windows-949-2000 |
KSC_5601 (Windows) KSC_5601 (MIME) |
KSC_5601 (Java) |
ibm-970_P110_P110-2006_U2 |
| windows-949-2000 |
csKSC56011987 (Windows) |
csKSC56011987 (IANA) |
ibm-1363_P11B-1998 |
| windows-949-2000 |
korean (Windows) |
korean (IANA) |
ibm-1363_P11B-1998 |
| windows-949-2000 |
iso-ir-149 (Windows) |
iso-ir-149 (IANA) |
ibm-1363_P11B-1998 |
| ibm-874_P100-1995 |
TIS-620 (Java) TIS-620 (IANA) |
TIS-620 (Windows) |
windows-874-2000 |
| ibm-1250_P100-1995 |
windows-1250 (Untagged) |
windows-1250 (Windows) windows-1250 (Java) windows-1250 (IANA) |
ibm-5346_P100-1998 |
| ibm-1251_P100-1995 |
windows-1251 (Untagged) |
windows-1251 (Windows) windows-1251 (Java) windows-1251 (IANA) |
ibm-5347_P100-1998 |
| ibm-1252_P100-2000 |
windows-1252 (Untagged) |
windows-1252 (Windows) windows-1252 (Java) windows-1252 (IANA) |
ibm-5348_P100-1997 |
| ibm-1253_P100-1995 |
windows-1253 (Untagged) |
windows-1253 (Windows) windows-1253 (Java) windows-1253 (IANA) |
ibm-5349_P100-1998 |
| ibm-1254_P100-1995 |
windows-1254 (Untagged) |
windows-1254 (Windows) windows-1254 (Java) windows-1254 (IANA) |
ibm-5350_P100-1998 |
| ibm-5351_P100-1998 |
windows-1255 (Untagged) |
windows-1255 (Windows) windows-1255 (Java) windows-1255 (IANA) |
ibm-9447_P100-2002 |
| ibm-5352_P100-1998 |
windows-1256 (Untagged) |
windows-1256 (Windows) windows-1256 (Java) windows-1256 (IANA) |
ibm-9448_X100-2005 |
| ibm-5353_P100-1998 |
windows-1257 (Untagged) |
windows-1257 (Windows) windows-1257 (Java) windows-1257 (IANA) |
ibm-9449_P100-2002 |
| ibm-1258_P100-1997 |
windows-1258 (Untagged) |
windows-1258 (Windows) windows-1258 (Java) windows-1258 (IANA) |
ibm-5354_P100-1998 |
Tom.