Date: Fri, 24 Sep 2021 10:17:53 +0200
Jens, Hubert.
Are you satisfied with the added recommended practice sections, and other
changes?
Thanks,
Have a great day,
Corentin
On Thu, Sep 23, 2021 at 2:16 PM Corentin <corentin.jabot_at_[hidden]> wrote:
>
>
> On Thu, Sep 23, 2021 at 2:00 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
>
>> On 23/09/2021 13.10, Peter Brett wrote:
>> >> -----Original Message-----
>> >> From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Jens Maurer
>> via SG16
>> >> Sent: 23 September 2021 11:18
>> >> To: Corentin <corentin.jabot_at_[hidden]>
>> >
>> >>> "Obviously broken" is a rather big claim in the absence of suitable
>> >> alternatives.
>> >>> I believe the use of the IANA registry is motivated by the paper and
>> >> previous polls.
>> >>
>> >> I think the polls are are not sufficiently precise to argue for the
>> case
>> >> that what the IANA table describes (by implication) as a narrow
>> encoding
>> >> cannot be re-used to designate a wide encoding trivially derived from
>> the
>> >> narrow encoding.
>> >>
>> >
>> > This is a good thing.
>> >
>> > If I:
>> >
>> > 1. Obtain the wide literal encoding, E, with the P1885 facility
>> > 2. Obtain a wide string literal.
>> > 3. Memory copy the string literal into a byte array.
>> > 4. Ask an external library [1] "is this byte array validly encoded with
>> this encoding, E".
>>
>> Ok, then the encoding value does need to represent the narrow/wide
>> differentiation, and also the endianness on the platform, because
>> a 16-bit wchar_t on a little-endian platform obviously yields
>> different byte values than the same 16-bit wchar_t value on a
>> big-endian platform.
>>
>> In particular the latter point is not obvious at all from the normative
>> wording,
>> because an implementation can reasonably expect that all the encoding
>> specifies
>> is the sequence of numbers in an array of wchar_t (that's actually how
>> literal encoding is specified), and not how that maps to a sequence of
>> bytes.
>>
>
> Yes, that's also one of the reasons I think it's not a great idea to try
> to define some kind of narrow<-> wide "trivial" mapping,
> and let implementations who do that document it in the way that matches
> their platforms behavior.
>
>>
>> A few more thoughts here:
>>
>> - The IANA table has UTF-16BE and UTF-16LE, which is consistent with the
>> "byte array" interpretation, but it also has UTF-16, which should thus
>> never
>> appear as a result value of wide_literal(). I'd suggest to make this
>> explicit in the wording.
>>
>
> UTF-16 signals UTF-16BE/UTF-16LE depending on platform endianness
>
>
>>
>> - We already know the endianness of the platform, so having the wide
>> encoding represent the platform endianess is redundant.
>>
>
> To make that clear:
>
> A registered character encoding is a character encoding scheme in the IANA
> Character Sets registry.
>
>>
>>
>> Jens
>>
>>
Are you satisfied with the added recommended practice sections, and other
changes?
Thanks,
Have a great day,
Corentin
On Thu, Sep 23, 2021 at 2:16 PM Corentin <corentin.jabot_at_[hidden]> wrote:
>
>
> On Thu, Sep 23, 2021 at 2:00 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:
>
>> On 23/09/2021 13.10, Peter Brett wrote:
>> >> -----Original Message-----
>> >> From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Jens Maurer
>> via SG16
>> >> Sent: 23 September 2021 11:18
>> >> To: Corentin <corentin.jabot_at_[hidden]>
>> >
>> >>> "Obviously broken" is a rather big claim in the absence of suitable
>> >> alternatives.
>> >>> I believe the use of the IANA registry is motivated by the paper and
>> >> previous polls.
>> >>
>> >> I think the polls are are not sufficiently precise to argue for the
>> case
>> >> that what the IANA table describes (by implication) as a narrow
>> encoding
>> >> cannot be re-used to designate a wide encoding trivially derived from
>> the
>> >> narrow encoding.
>> >>
>> >
>> > This is a good thing.
>> >
>> > If I:
>> >
>> > 1. Obtain the wide literal encoding, E, with the P1885 facility
>> > 2. Obtain a wide string literal.
>> > 3. Memory copy the string literal into a byte array.
>> > 4. Ask an external library [1] "is this byte array validly encoded with
>> this encoding, E".
>>
>> Ok, then the encoding value does need to represent the narrow/wide
>> differentiation, and also the endianness on the platform, because
>> a 16-bit wchar_t on a little-endian platform obviously yields
>> different byte values than the same 16-bit wchar_t value on a
>> big-endian platform.
>>
>> In particular the latter point is not obvious at all from the normative
>> wording,
>> because an implementation can reasonably expect that all the encoding
>> specifies
>> is the sequence of numbers in an array of wchar_t (that's actually how
>> literal encoding is specified), and not how that maps to a sequence of
>> bytes.
>>
>
> Yes, that's also one of the reasons I think it's not a great idea to try
> to define some kind of narrow<-> wide "trivial" mapping,
> and let implementations who do that document it in the way that matches
> their platforms behavior.
>
>>
>> A few more thoughts here:
>>
>> - The IANA table has UTF-16BE and UTF-16LE, which is consistent with the
>> "byte array" interpretation, but it also has UTF-16, which should thus
>> never
>> appear as a result value of wide_literal(). I'd suggest to make this
>> explicit in the wording.
>>
>
> UTF-16 signals UTF-16BE/UTF-16LE depending on platform endianness
>
>
>>
>> - We already know the endianness of the platform, so having the wide
>> encoding represent the platform endianess is redundant.
>>
>
> To make that clear:
>
> A registered character encoding is a character encoding scheme in the IANA
> Character Sets registry.
>
>>
>>
>> Jens
>>
>>
Received on 2021-09-24 03:18:06