Date: Tue, 24 Mar 2020 16:42:10 +0100
On Tue, 24 Mar 2020 at 15:42, Steven R. Loomis <srl295_at_[hidden]> wrote:
> Corentin,
> Please see some of the work done in ICU on encodings.
>
> In particular, IANA does not specify the actual mapping. So we have found
> the IANA names insufficient to distinguish two actual encodings, shift_jis
> is an example. Comment and datafile:
>
> https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/convrtrs.txt#L93
>
> So while IANA names are widely used from a spec point of view, in practice
> there are many, many challenges with their use in implementation.
>
This proposal is solely about names and not encoding conversion facilities
>
>
> There is a stablilized UTR22 https://unicode.org/reports/tr22/ for an XML
> form of char mappings, and ICU had produced an XML form:
> https://github.com/unicode-org/icu-data/tree/master/charset/data/xml However,
> we stopped short of actually maintaining a registry of UTR22 mappings with
> their names (other than what is in ICU.)
>
> In practice, I think most of the use of encodings today falls under the
> WHATWG set: https://encoding.spec.whatwg.org/ - which defines exactly the
> behavior and naming for a very small subset.
>
>
> So I also have concerns about simply referring to IANA.
>
>
> --
> Steven R. Loomis | @srl295 | git.io/srl295
>
>
>
> El mar. 24, 2020, a las 2:35 a. m., Corentin via SG16 <
> sg16_at_[hidden]> escribió:
>
> Hey!
> Thanks for your feedback
>
> A few things:
>
> * It does not evolve a lot (Neither the database nor the proposal are
> forward looking - RFC3808 is from 2004)
> * There is nothing more complete (or more official)
> * It has vendor buy in (form Microsoft and IBM for which it maps to their
> code page), the same names are also used by iconv on unix system
> * It is widely used by browsers, mail clients
> * We have experience with referencing rfc in the standards.
> * If this is still a concern, we could duplicate the entire thing in the
> standard - which I would recommend against.
>
> That standard registry is pivotal to the proposal portability. we need to
> agree on names and meaning.
>
> I hope that helps,
>
> Regards,
> Corentin
>
>
> On Tue, 24 Mar 2020 at 09:26, Peter Brett <pbrett_at_[hidden]> wrote:
>
>> Hi Corentin and SG16,
>>
>> We discussed P1885R1 briefly in the British Standards Institute meeting
>> yesterday.
>>
>> We support the general direction of the paper and agree that it seeks to
>> solve a real problem. We support further work.
>>
>> We have significant concerns about the proposal to rely on the IANA
>> registry and RFC2978/RFC3808 process, including a normative reference to
>> the Character Sets database. The Character Sets database is not an
>> International Standard and is maintained by a process that appears to
>> provide neither the quality assurance nor the checks and balances built
>> into the ISO process.
>>
>> Best regards,
>>
>> Peter
>>
>> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>
>
> Corentin,
> Please see some of the work done in ICU on encodings.
>
> In particular, IANA does not specify the actual mapping. So we have found
> the IANA names insufficient to distinguish two actual encodings, shift_jis
> is an example. Comment and datafile:
>
> https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/convrtrs.txt#L93
>
> So while IANA names are widely used from a spec point of view, in practice
> there are many, many challenges with their use in implementation.
>
This proposal is solely about names and not encoding conversion facilities
>
>
> There is a stablilized UTR22 https://unicode.org/reports/tr22/ for an XML
> form of char mappings, and ICU had produced an XML form:
> https://github.com/unicode-org/icu-data/tree/master/charset/data/xml However,
> we stopped short of actually maintaining a registry of UTR22 mappings with
> their names (other than what is in ICU.)
>
> In practice, I think most of the use of encodings today falls under the
> WHATWG set: https://encoding.spec.whatwg.org/ - which defines exactly the
> behavior and naming for a very small subset.
>
>
> So I also have concerns about simply referring to IANA.
>
>
> --
> Steven R. Loomis | @srl295 | git.io/srl295
>
>
>
> El mar. 24, 2020, a las 2:35 a. m., Corentin via SG16 <
> sg16_at_[hidden]> escribió:
>
> Hey!
> Thanks for your feedback
>
> A few things:
>
> * It does not evolve a lot (Neither the database nor the proposal are
> forward looking - RFC3808 is from 2004)
> * There is nothing more complete (or more official)
> * It has vendor buy in (form Microsoft and IBM for which it maps to their
> code page), the same names are also used by iconv on unix system
> * It is widely used by browsers, mail clients
> * We have experience with referencing rfc in the standards.
> * If this is still a concern, we could duplicate the entire thing in the
> standard - which I would recommend against.
>
> That standard registry is pivotal to the proposal portability. we need to
> agree on names and meaning.
>
> I hope that helps,
>
> Regards,
> Corentin
>
>
> On Tue, 24 Mar 2020 at 09:26, Peter Brett <pbrett_at_[hidden]> wrote:
>
>> Hi Corentin and SG16,
>>
>> We discussed P1885R1 briefly in the British Standards Institute meeting
>> yesterday.
>>
>> We support the general direction of the paper and agree that it seeks to
>> solve a real problem. We support further work.
>>
>> We have significant concerns about the proposal to rely on the IANA
>> registry and RFC2978/RFC3808 process, including a normative reference to
>> the Character Sets database. The Character Sets database is not an
>> International Standard and is maintained by a process that appears to
>> provide neither the quality assurance nor the checks and balances built
>> into the ISO process.
>>
>> Best regards,
>>
>> Peter
>>
>> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
>
>
Received on 2020-03-24 10:45:19