Date: Wed, 8 Feb 2023 22:02:48 -0500
On Wed, Feb 8, 2023 at 10:00 PM Robin Leroy via SG16 <sg16_at_[hidden]>
wrote:
>
> Le jeu. 9 févr. 2023 à 10:00, Corentin <corentin.jabot_at_[hidden]> a
> écrit :
>
>> Does that mean that CESU-8 is not "a Unicode encoding form"? ie we want
>> to make sure to filter out conforming-but-not-specified-in-Unicode
>> encodings.
>>
> CESU-8 is an encoding scheme; but I would have interpreted the language in
> The Unicode Standard and in UTR #17 as meaning that the Unicode encoding
> forms are only the three UTFs.
>
> Indeed, the standard, like #17 quoted earlier, repeatedly uses the
> definite article with the term Unicode encoding forms, sometimes explicitly
> with the number three (the Standard has 11 occurrences of *the Unicode
> encoding forms*, and 8 occurrences of *the three Unicode encoding forms*).
>
> However, D79 quoted by Jens contradicts that usage.
>
> Le jeu. 9 févr. 2023 à 09:56, Jens Maurer <jens.maurer_at_[hidden]> a écrit :
>
>>
>> D79 A Unicode encoding form assigns each Unicode scalar value to a unique
>> code unit sequence.
>
>
> I have opened an issue for the Properties and Algorithms Group
> <https://www.unicode.org/consortium/props-algorithms.html> (reporting to
> the Unicode Technical Committee) to look into this.
>
Thank you. We look forward to harmonization between D79 and the usage in
UTR #17.
-- HT
>
>
>> So, any rule that maps Unicode scalar values to a unique code point
>> sequence is a Unicode encoding form. This certainly includes
>> UTF-8, UTF-16, and UTF-32, but it also includes CESU-8.
>
> (Aside, the above should say *to a unique code unit sequence*, and it is
> not clear to me that CESU-8 should be seen as an encoding form with 8-bit
> code units and a trivial encoding scheme, rather than an encoding scheme
> for the UTF-16 encoding form; the title of UTR #26 suggests the latter.)
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
wrote:
>
> Le jeu. 9 févr. 2023 à 10:00, Corentin <corentin.jabot_at_[hidden]> a
> écrit :
>
>> Does that mean that CESU-8 is not "a Unicode encoding form"? ie we want
>> to make sure to filter out conforming-but-not-specified-in-Unicode
>> encodings.
>>
> CESU-8 is an encoding scheme; but I would have interpreted the language in
> The Unicode Standard and in UTR #17 as meaning that the Unicode encoding
> forms are only the three UTFs.
>
> Indeed, the standard, like #17 quoted earlier, repeatedly uses the
> definite article with the term Unicode encoding forms, sometimes explicitly
> with the number three (the Standard has 11 occurrences of *the Unicode
> encoding forms*, and 8 occurrences of *the three Unicode encoding forms*).
>
> However, D79 quoted by Jens contradicts that usage.
>
> Le jeu. 9 févr. 2023 à 09:56, Jens Maurer <jens.maurer_at_[hidden]> a écrit :
>
>>
>> D79 A Unicode encoding form assigns each Unicode scalar value to a unique
>> code unit sequence.
>
>
> I have opened an issue for the Properties and Algorithms Group
> <https://www.unicode.org/consortium/props-algorithms.html> (reporting to
> the Unicode Technical Committee) to look into this.
>
Thank you. We look forward to harmonization between D79 and the usage in
UTR #17.
-- HT
>
>
>> So, any rule that maps Unicode scalar values to a unique code point
>> sequence is a Unicode encoding form. This certainly includes
>> UTF-8, UTF-16, and UTF-32, but it also includes CESU-8.
>
> (Aside, the above should say *to a unique code unit sequence*, and it is
> not clear to me that CESU-8 should be seen as an encoding form with 8-bit
> code units and a trivial encoding scheme, rather than an encoding scheme
> for the UTF-16 encoding form; the title of UTR #26 suggests the latter.)
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2023-02-09 03:03:17