Date: Thu, 23 Sep 2021 14:00:02 +0200
On 23/09/2021 13.10, Peter Brett wrote:
>> -----Original Message-----
>> From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Jens Maurer via SG16
>> Sent: 23 September 2021 11:18
>> To: Corentin <corentin.jabot_at_[hidden]>
>
>>> "Obviously broken" is a rather big claim in the absence of suitable
>> alternatives.
>>> I believe the use of the IANA registry is motivated by the paper and
>> previous polls.
>>
>> I think the polls are are not sufficiently precise to argue for the case
>> that what the IANA table describes (by implication) as a narrow encoding
>> cannot be re-used to designate a wide encoding trivially derived from the
>> narrow encoding.
>>
>
> This is a good thing.
>
> If I:
>
> 1. Obtain the wide literal encoding, E, with the P1885 facility
> 2. Obtain a wide string literal.
> 3. Memory copy the string literal into a byte array.
> 4. Ask an external library [1] "is this byte array validly encoded with this encoding, E".
Ok, then the encoding value does need to represent the narrow/wide
differentiation, and also the endianness on the platform, because
a 16-bit wchar_t on a little-endian platform obviously yields
different byte values than the same 16-bit wchar_t value on a
big-endian platform.
In particular the latter point is not obvious at all from the normative wording,
because an implementation can reasonably expect that all the encoding specifies
is the sequence of numbers in an array of wchar_t (that's actually how
literal encoding is specified), and not how that maps to a sequence of bytes.
A few more thoughts here:
- The IANA table has UTF-16BE and UTF-16LE, which is consistent with the
"byte array" interpretation, but it also has UTF-16, which should thus never
appear as a result value of wide_literal(). I'd suggest to make this
explicit in the wording.
- We already know the endianness of the platform, so having the wide
encoding represent the platform endianess is redundant.
Jens
>> -----Original Message-----
>> From: SG16 <sg16-bounces_at_[hidden]> On Behalf Of Jens Maurer via SG16
>> Sent: 23 September 2021 11:18
>> To: Corentin <corentin.jabot_at_[hidden]>
>
>>> "Obviously broken" is a rather big claim in the absence of suitable
>> alternatives.
>>> I believe the use of the IANA registry is motivated by the paper and
>> previous polls.
>>
>> I think the polls are are not sufficiently precise to argue for the case
>> that what the IANA table describes (by implication) as a narrow encoding
>> cannot be re-used to designate a wide encoding trivially derived from the
>> narrow encoding.
>>
>
> This is a good thing.
>
> If I:
>
> 1. Obtain the wide literal encoding, E, with the P1885 facility
> 2. Obtain a wide string literal.
> 3. Memory copy the string literal into a byte array.
> 4. Ask an external library [1] "is this byte array validly encoded with this encoding, E".
Ok, then the encoding value does need to represent the narrow/wide
differentiation, and also the endianness on the platform, because
a 16-bit wchar_t on a little-endian platform obviously yields
different byte values than the same 16-bit wchar_t value on a
big-endian platform.
In particular the latter point is not obvious at all from the normative wording,
because an implementation can reasonably expect that all the encoding specifies
is the sequence of numbers in an array of wchar_t (that's actually how
literal encoding is specified), and not how that maps to a sequence of bytes.
A few more thoughts here:
- The IANA table has UTF-16BE and UTF-16LE, which is consistent with the
"byte array" interpretation, but it also has UTF-16, which should thus never
appear as a result value of wide_literal(). I'd suggest to make this
explicit in the wording.
- We already know the endianness of the platform, so having the wide
encoding represent the platform endianess is redundant.
Jens
Received on 2021-09-23 07:00:16