C++ Logo

sg16

Advanced search

Re: [SG16] UK national body concerns about P1885R1 'Naming Text Encodings to Demystify Them'

From: Steven R. Loomis <srl295_at_[hidden]>
Date: Tue, 24 Mar 2020 07:42:43 -0700
Corentin,
 Please see some of the work done in ICU on encodings.

In particular, IANA does not specify the actual mapping. So we have found the IANA names insufficient to distinguish two actual encodings, shift_jis is an example. Comment and datafile:
https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/convrtrs.txt#L93 <https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/convrtrs.txt#L93>

So while IANA names are widely used from a spec point of view, in practice there are many, many challenges with their use in implementation.


There is a stablilized UTR22 https://unicode.org/reports/tr22/ <https://unicode.org/reports/tr22/> for an XML form of char mappings, and ICU had produced an XML form: https://github.com/unicode-org/icu-data/tree/master/charset/data/xml <https://github.com/unicode-org/icu-data/tree/master/charset/data/xml> However, we stopped short of actually maintaining a registry of UTR22 mappings with their names (other than what is in ICU.)

In practice, I think most of the use of encodings today falls under the WHATWG set: https://encoding.spec.whatwg.org/ <https://encoding.spec.whatwg.org/> - which defines exactly the behavior and naming for a very small subset.


So I also have concerns about simply referring to IANA.


--
Steven R. Loomis | @srl295 | git.io/srl295
> El mar. 24, 2020, a las 2:35 a. m., Corentin via SG16 <sg16_at_[hidden]> escribió:
> 
> Hey! 
> Thanks for your feedback
> 
> A few things:
> 
> * It does not evolve a lot (Neither the database nor the proposal are forward looking - RFC3808 is from 2004) 
> * There is nothing more complete (or more official)
> * It has vendor buy in (form Microsoft and IBM for which it maps to their code page), the same names are also used by iconv on unix system
> * It is widely used by browsers, mail clients
> * We have experience with referencing rfc in the standards.
> * If this is still a concern, we could duplicate the entire thing in the standard - which I would recommend against.
> 
> That standard registry is pivotal to the proposal portability. we need to agree on names and meaning.
> 
> I hope that helps,
> 
> Regards, 
> Corentin
> 
> 
> On Tue, 24 Mar 2020 at 09:26, Peter Brett <pbrett_at_[hidden] <mailto:pbrett_at_[hidden]>> wrote:
> Hi Corentin and SG16,
> 
> We discussed P1885R1 briefly in the British Standards Institute meeting yesterday.
> 
> We support the general direction of the paper and agree that it seeks to solve a real problem.  We support further work.
> 
> We have significant concerns about the proposal to rely on the IANA registry and RFC2978/RFC3808 process, including a normative reference to the Character Sets database.  The Character Sets database is not an International Standard and is maintained by a process that appears to provide neither the quality assurance nor the checks and balances built into the ISO process.
> 
> Best regards,
> 
>                            Peter
> 
> -- 
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16

Received on 2020-03-24 09:45:35