C++ Logo


Advanced search

Re: Term for "UTF-8, UTF-16 and UTF-32"

From: Robin Leroy <egg.robin.leroy_at_[hidden]>
Date: Thu, 9 Feb 2023 09:39:28 +0800
Dear Corentin,

I think you want to refer to *the Unicode encoding forms*.
See, for instance:
The Unicode Standard, Section 3.9, Unicode Encoding Forms

> The Unicode Standard supports three character encoding forms: UTF-32,
> UTF-16, and UTF-8.

Unicode Technical Report #17, Unicode Character Encoding Model, Section
5 Character Encoding Scheme (CES):

> Some of the Unicode encoding schemes have the same labels as the three
> Unicode encoding forms.

Note that *Unicode encodings specified in the Unicode standard* is a little
bit ambiguous, because Unicode distinguishes the encoding *forms* (code
points to code units) from the encoding *schemes* (code units to bytes; the
Unicode Standard supports seven encoding schemes, with LE/BE/BOM for 16 and
32). Assuming that the context here is [format.string.escaped] in document
P2736, it looks like you are indeed dealing with the interpretation of code
units (represented by the types char8_t, char16_t, and char32_t, per
[lex.string.literal] referenced in [format.string.escaped]), and thus with
encoding *forms*.

Best regards,

Robin Leroy

Le mer. 8 févr. 2023 à 00:32, Corentin <corentin.jabot_at_[hidden]> a écrit :

> Hey Robin,
> How are you?
> Does Unicode have a term to designate "UTF-8, UTF-16 and UTF-32", i.e.
> Unicode encodings specified in the Unicode standard - excluding things like
> CESU-8 for example?
> It's something we would find useful in the C++ specification
> Thanks,
> Corentin

Received on 2023-02-09 01:39:46