C++ Logo

SG16

Advanced search

Subject: Re: UK national body concerns about P1885R1 'Naming Text Encodings to Demystify Them'
From: Tom Honermann (tom_at_[hidden])
Date: 2020-03-24 12:49:02


On 3/24/20 11:42 AM, Corentin via SG16 wrote:
>
>
> On Tue, 24 Mar 2020 at 15:42, Steven R. Loomis <srl295_at_[hidden]
> <mailto:srl295_at_[hidden]>> wrote:
>
> Corentin,
>  Please see some of the work done in ICU on encodings.
>
> In particular, IANA does not specify the actual mapping. So we
> have found the IANA names insufficient to distinguish two actual
> encodings, shift_jis is an example.  Comment and datafile:
> https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/convrtrs.txt#L93
>
> So while IANA names are widely used from a spec point of view, in
> practice there are many, many challenges with their use in
> implementation.
>
>
> This proposal is solely about names and not encoding conversion facilities

Corentin, I think you misunderstood Steven's response.  I believe his
point was that some (at least one) of the names in the IANA registry are
insufficient to uniquely identify an encoding.

Tom.

>
>
> There is a stablilized UTR22 https://unicode.org/reports/tr22/%c2%a0for
> an XML form of char mappings, and ICU had produced an XML form:
> https://github.com/unicode-org/icu-data/tree/master/charset/data/xml%c2%a0However,
> we stopped short of actually maintaining a registry of UTR22
> mappings with their names (other than what is in ICU.)
>
> In practice, I think most of the use of encodings today falls
> under the WHATWG set: https://encoding.spec.whatwg.org/%c2 - which
> defines exactly the behavior and naming for a very small subset.
>
>
> So I also have concerns about simply referring to IANA.
>
>
> --
> Steven R. Loomis | @srl295 | git.io/srl295 <http://git.io/srl295>
>
>
>
>> El mar. 24, 2020, a las 2:35 a. m., Corentin via SG16
>> <sg16_at_[hidden] <mailto:sg16_at_[hidden]>> escribió:
>>
>> Hey!
>> Thanks for your feedback
>>
>> A few things:
>>
>> * It does not evolve a lot (Neither the database nor the proposal
>> are forward looking - RFC3808 is from 2004)
>> * There is nothing more complete (or more official)
>> * It has vendor buy in (form Microsoft and IBM for which it maps
>> to their code page), the same names are also used by iconv on
>> unix system
>> * It is widely used by browsers, mail clients
>> * We have experience with referencing rfc in the standards.
>> * If this is still a concern, we could duplicate the entire thing
>> in the standard - which I would recommend against.
>>
>> That standard registry is pivotal to the proposal portability. we
>> need to agree on names and meaning.
>>
>> I hope that helps,
>>
>> Regards,
>> Corentin
>>
>>
>> On Tue, 24 Mar 2020 at 09:26, Peter Brett <pbrett_at_[hidden]
>> <mailto:pbrett_at_[hidden]>> wrote:
>>
>> Hi Corentin and SG16,
>>
>> We discussed P1885R1 briefly in the British Standards
>> Institute meeting yesterday.
>>
>> We support the general direction of the paper and agree that
>> it seeks to solve a real problem.  We support further work.
>>
>> We have significant concerns about the proposal to rely on
>> the IANA registry and RFC2978/RFC3808 process, including a
>> normative reference to the Character Sets database.  The
>> Character Sets database is not an International Standard and
>> is maintained by a process that appears to provide neither
>> the quality assurance nor the checks and balances built into
>> the ISO process.
>>
>> Best regards,
>>
>>                            Peter
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden] <mailto:SG16_at_[hidden]>
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>



SG16 list run by herb.sutter at gmail.com