C++ Logo

sg16

Advanced search

Re: Rewording wording for named-universal-characters

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Fri, 25 Feb 2022 23:20:02 +0100
Can we flip it around?

If the n-char-sequence of a named-universal-character matches either:
    - The name alias of a character as specified in ISO/IEC 10646:2020
clause 34
    - The associated name of a character as specified in ISO/IEC
10646:2020 clause 34
    - A control code alias as specified in table X
Then the named-universal-character designates the element of the
translation character set whose UCS scalar value is equal to the code point
of that character.
Otherwise, the program is ill-formed.
[Note: The lists of names and aliases are guaranteed to be disjoint. An
n-char sequence will be found in at most one list. --end note]


On Fri, Feb 25, 2022 at 10:50 PM Steve Downey <sdowney_at_[hidden]> wrote:

> The feedback was overall positive, however the paragraph
>
> A *named-universal-character* designates the character in the translation
> character set
>
> - whose associated character name or character name alias as specified
> in ISO/IEC 10646:2020 clause 34 or
> - whose control code alias in table X
>
> is the given *n-char-sequence*. The program is ill-formed if there is no
> such character.
>
> was seen as confusing. The goal is to say that we find the n-char-sequence
> in one of three lists, the 'associated character name' list, which is the
> immutable name for a code point, the 'character name alias' which is a
> short list of renamed code points where the immutable name is incorrect,
> and the list of control code aliases which do not have either 'associated
> character names' or 'character name aliases' but do have names specified in
> the unicode character database. These names are all guaranteed to be
> distinct from one another.
> This is documented in the Unicode Standard in Section 4.8, Names. In terms
> of the Unicode Character Database, we are using the Name property of
> assigned characters combined with 'correction', 'control', and 'alternate'
> types from NameAliases.txt. ISO 10646 marks correction and alternate in
> 34.5 Code charts and lists of character names with ※, but not the control
> aliases, which is why we have the table X.
>
> Suggestion:
> The character in the translation character set designated by the
> named-universal-character which has the character code point where the
> n-char-sequence matches either the associated character name or character
> name alias, as specified in ISO 10646 "Code charts and lists of character
> names", or matches the control code alias for a code point in table X. If
> no name or alias matches the program is ill-formed. [Note: The lists of
> names and aliases are guaranteed to be disjoint. An n-char sequence will be
> found in at most one list. --end note]
>
>
> Corentin also pointed out the N should be added to the list of characters
> that are not conditional-escape-sequence-char
> <http://eel.is/c++draft/lex.literal#nt:conditional-escape-sequence-char> in
> [lex.conn]
>
> conditional-escape-sequence-char:
> <http://eel.is/c++draft/lex.literal#nt:conditional-escape-sequence-char>
> any member of the basic character set that is not an *octal-digit*, a
> *simple-escape-sequence-char*, or the characters N, u, U, or x
>

Received on 2022-02-25 22:20:15