C++ Logo


Advanced search

Rewording wording for named-universal-characters

From: Steve Downey <sdowney_at_[hidden]>
Date: Fri, 25 Feb 2022 16:49:57 -0500
The feedback was overall positive, however the paragraph

A *named-universal-character* designates the character in the translation
character set

   - whose associated character name or character name alias as specified
   in ISO/IEC 10646:2020 clause 34 or
   - whose control code alias in table X

is the given *n-char-sequence*. The program is ill-formed if there is no
such character.

was seen as confusing. The goal is to say that we find the n-char-sequence
in one of three lists, the 'associated character name' list, which is the
immutable name for a code point, the 'character name alias' which is a
short list of renamed code points where the immutable name is incorrect,
and the list of control code aliases which do not have either 'associated
character names' or 'character name aliases' but do have names specified in
the unicode character database. These names are all guaranteed to be
distinct from one another.
This is documented in the Unicode Standard in Section 4.8, Names. In terms
of the Unicode Character Database, we are using the Name property of
assigned characters combined with 'correction', 'control', and 'alternate'
types from NameAliases.txt. ISO 10646 marks correction and alternate in
34.5 Code charts and lists of character names with ※, but not the control
aliases, which is why we have the table X.

The character in the translation character set designated by the
named-universal-character which has the character code point where the
n-char-sequence matches either the associated character name or character
name alias, as specified in ISO 10646 "Code charts and lists of character
names", or matches the control code alias for a code point in table X. If
no name or alias matches the program is ill-formed. [Note: The lists of
names and aliases are guaranteed to be disjoint. An n-char sequence will be
found in at most one list. --end note]

Corentin also pointed out the N should be added to the list of characters
that are not conditional-escape-sequence-char
<http://eel.is/c++draft/lex.literal#nt:conditional-escape-sequence-char> in

any member of the basic character set that is not an *octal-digit*, a
*simple-escape-sequence-char*, or the characters N, u, U, or x

Received on 2022-02-25 21:50:13