Le jeu. 9 févr. 2023 à 10:00, Corentin <corentin.jabot@gmail.com> a écrit :

Does that mean that CESU-8 is not "a Unicode encoding form"? ie we want to make sure to filter out conforming-but-not-specified-in-Unicode encodings.

CESU-8 is an encoding scheme; but I would have interpreted the language in The Unicode Standard and in UTR #17 as meaning that the Unicode encoding forms are only the three UTFs.

Indeed, the standard, like #17 quoted earlier, repeatedly uses the definite article with the term Unicode encoding forms, sometimes explicitly with the number three (the Standard has 11 occurrences of the Unicode encoding forms, and 8 occurrences of the three Unicode encoding forms).

However, D79 quoted by Jens contradicts that usage.

Le jeu. 9 févr. 2023 à 09:56, Jens Maurer <jens.maurer@gmx.net> a écrit :

D79 A Unicode encoding form assigns each Unicode scalar value to a unique code unit sequence.

I have opened an issue for the Properties and Algorithms Group (reporting to the Unicode Technical Committee) to look into this.

So, any rule that maps Unicode scalar values to a unique code point
sequence is a Unicode encoding form. This certainly includes
UTF-8, UTF-16, and UTF-32, but it also includes CESU-8.

(Aside, the above should say to a unique code unit sequence, and it is not clear to me that CESU-8 should be seen as an encoding form with 8-bit code units and a trivial encoding scheme, rather than an encoding scheme for the UTF-16 encoding form; the title of UTR #26 suggests the latter.)