On Wed, Feb 8, 2023 at 10:00 PM Robin Leroy via SG16 <sg16@lists.isocpp.org> wrote:

Le jeu. 9 févr. 2023 à 10:00, Corentin <corentin.jabot@gmail.com> a écrit :
Does that mean that CESU-8 is not "a Unicode encoding form"? ie we want to make sure to filter out conforming-but-not-specified-in-Unicode encodings.
CESU-8 is an encoding scheme; but I would have interpreted the language in The Unicode Standard and in UTR #17 as meaning that the Unicode encoding forms are only the three UTFs.

Indeed, the standard, like #17 quoted earlier, repeatedly uses the definite article with the term Unicode encoding forms, sometimes explicitly with the number three (the Standard has 11 occurrences of the Unicode encoding forms, and 8 occurrences of the three Unicode encoding forms).

However, D79 quoted by Jens contradicts that usage.

Le jeu. 9 févr. 2023 à 09:56, Jens Maurer <jens.maurer@gmx.net> a écrit :

D79 A Unicode encoding form assigns each Unicode scalar value to a unique code unit sequence.

I have opened an issue for the Properties and Algorithms Group (reporting to the Unicode Technical Committee) to look into this.

Thank you. We look forward to harmonization between D79 and the usage in UTR #17.

-- HT

So, any rule that maps Unicode scalar values to a unique code point
sequence is a Unicode encoding form. This certainly includes
UTF-8, UTF-16, and UTF-32, but it also includes CESU-8.
(Aside, the above should say to a unique code unit sequence, and it is not clear to me that CESU-8 should be seen as an encoding form with 8-bit code units and a trivial encoding scheme, rather than an encoding scheme for the UTF-16 encoding form; the title of UTR #26 suggests the latter.)
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16