C++ Logo

sg16

Advanced search

Re: [SG16] UK national body concerns about P1885R1 'Naming Text Encodings to Demystify Them'

From: Ville Voutilainen <ville.voutilainen_at_[hidden]>
Date: Tue, 24 Mar 2020 16:44:40 +0200
On Tue, 24 Mar 2020 at 16:42, Steven R. Loomis via SG16
<sg16_at_[hidden]> wrote:
>
> Corentin,
> Please see some of the work done in ICU on encodings.
>
> In particular, IANA does not specify the actual mapping. So we have found the IANA names insufficient to distinguish two actual encodings, shift_jis is an example. Comment and datafile:
> https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/convrtrs.txt#L93
>
> So while IANA names are widely used from a spec point of view, in practice there are many, many challenges with their use in implementation.
>
>
> There is a stablilized UTR22 https://unicode.org/reports/tr22/ for an XML form of char mappings, and ICU had produced an XML form: https://github.com/unicode-org/icu-data/tree/master/charset/data/xml However, we stopped short of actually maintaining a registry of UTR22 mappings with their names (other than what is in ICU.)
>
> In practice, I think most of the use of encodings today falls under the WHATWG set: https://encoding.spec.whatwg.org/ - which defines exactly the behavior and naming for a very small subset.
>
>
> So I also have concerns about simply referring to IANA.

Would referring to ISO 15897 be different?

Received on 2020-03-24 09:47:43