C++ Logo

sg16

Advanced search

Re: Rewording wording for named-universal-characters

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Fri, 25 Feb 2022 23:33:47 +0100
On 25/02/2022 23.20, Corentin Jabot wrote:
> Can we flip it around?
>
> If the n-char-sequence of a named-universal-character matches either:
> - The name alias of a character as specified in ISO/IEC 10646:2020 clause 34
> - The associated name of a character as specified in ISO/IEC 10646:2020 clause 34
> - A control code alias as specified in table X
> Then the named-universal-character designates the element of the translation character set whose UCS scalar value is equal to the code point of that character.
> Otherwise, the program is ill-formed.
> [Note: The lists of names and aliases are guaranteed to be disjoint. An n-char sequence will be found in at most one list. --end note]

We want to avoid "matches" because it might mean "some fuzzy match" instead of
equality.

We want to start the paragraph with the same introducer as the preceding one.

We want to quote the subclause title from ISO 10646 and drop :2020.

Jens


>
> On Fri, Feb 25, 2022 at 10:50 PM Steve Downey <sdowney_at_[hidden] <mailto:sdowney_at_[hidden]>> wrote:
>
> The feedback was overall positive, however the paragraph
>
> A /named-universal-character/ designates the character in the translation character set
>
> * whose associated character name or character name alias as specified in ISO/IEC 10646:2020 clause 34 or
> * whose control code alias in table X
>
> is the given /n-char-sequence/. The program is ill-formed if there is no such character.
>
> was seen as confusing. The goal is to say that we find the n-char-sequence in one of three lists, the 'associated character name' list, which is the immutable name for a code point, the 'character name alias' which is a short list of renamed code points where the immutable name is incorrect, and the list of control code aliases which do not have either 'associated character names' or 'character name aliases' but do have names specified in the unicode character database. These names are all guaranteed to be distinct from one another.
> This is documented in the Unicode Standard in Section 4.8, Names. In terms of the Unicode Character Database, we are using the Name property of assigned characters combined with 'correction', 'control', and 'alternate' types from NameAliases.txt. ISO 10646 marks correction and alternate in 34.5 Code charts and lists of character names with ※, but not the control aliases, which is why we have the table X.
>
> Suggestion:
> The character in the translation character set designated by the named-universal-character which has the character code point where the n-char-sequence matches either the associated character name or character name alias, as specified in ISO 10646 "Code charts and lists of character names", or matches the control code alias for a code point in table X. If no name or alias matches the program is ill-formed. [Note: The lists of names and aliases are guaranteed to be disjoint. An n-char sequence will be found in at most one list. --end note]
>
>
> Corentin also pointed out the N should be added to the list of characters that are not conditional-escape-sequence-char <http://eel.is/c++draft/lex.literal#nt:conditional-escape-sequence-char> in [lex.conn]
>
> conditional-escape-sequence-char: <http://eel.is/c++draft/lex.literal#nt:conditional-escape-sequence-char>
> any member of the basic character set that is not an /octal-digit/, a /simple-escape-sequence-char/, or the characters N, u, U, or x
>

Received on 2022-02-25 22:33:55