On Fri, Feb 25, 2022 at 11:33 PM Jens Maurer <Jens.Maurer@gmx.net> wrote:
On 25/02/2022 23.20, Corentin Jabot wrote:
> Can we flip it around?
>

> Then the named-universal-character designates the element of the translation character set whose UCS scalar value is equal to the code point of that character.
> Otherwise, the program is ill-formed.
> [Note: The lists of names and aliases are guaranteed to be disjoint. An n-char sequence will be found in at most one list. --end note]

We want to avoid "matches" because it might mean "some fuzzy match" instead of
equality.

We want to start the paragraph with the same introducer as the preceding one.

This is challenging. 
There are a lot of moving pieces.
Can we rewrite the previous paragraph too?


If the n-char-sequence of a named-universal-character is exactly equal to either
-  The name alias of a character as specified in ISO/IEC 10646 clause 34 "Character names list"
-  The associated name of a character as specified in ISO/IEC 10646 clause 34 "Character names list"
-  A control code alias of a character as specified in table X
Then the named-universal-character designates the code point of that character.

A universal-character-name designates the character in the translation character set whose UCS scalar value is:
  - For a universal-character-name of the form \u hex-quad or \U hex-quad hex-quad, the hexadecimal number represented by the sequence of hexadecimal-digits in the universal-character-name.
  - For a named-universal-character, the code point it designates.

If a universal-character-name does not designate a UCS scalar value, the program is ill-formed.






 

We want to quote the subclause title from ISO 10646 and drop :2020.

Jens


>
> On Fri, Feb 25, 2022 at 10:50 PM Steve Downey <sdowney@gmail.com <mailto:sdowney@gmail.com>> wrote:
>
>     The feedback was overall positive, however the paragraph 
>
>     A /named-universal-character/ designates the character in the translation character set
>
>       * whose associated character name or character name alias as specified in ISO/IEC 10646:2020 clause 34 or
>       * whose control code alias in table X
>
>     is the given /n-char-sequence/. The program is ill-formed if there is no such character.
>
>     was seen as confusing. The goal is to say that we find the n-char-sequence in one of three lists, the 'associated character name' list, which is the immutable name for a code point, the 'character name alias' which is a short list of renamed code points where the immutable name is incorrect, and the list of control code aliases which do not have either  'associated character names' or 'character name aliases' but do have names specified in the unicode character database. These names are all guaranteed to be distinct from one another. 
>     This is documented in the Unicode Standard in Section 4.8, Names. In terms of the Unicode Character Database, we are using the Name property of assigned characters combined with 'correction', 'control', and 'alternate' types from NameAliases.txt. ISO 10646 marks correction and alternate in 34.5 Code charts and lists of character names with ※, but not the control aliases, which is why we have the table X. 
>
>     Suggestion:
>     The character in the translation character set designated by the named-universal-character which has the character code point where the n-char-sequence matches either the associated character name or character name alias, as specified in ISO 10646 "Code charts and lists of character names", or matches the control code alias for a code point in table X. If no name or alias matches the program is ill-formed. [Note: The lists of names and aliases are guaranteed to be disjoint. An n-char sequence will be found in at most one list. --end note]
>
>
>     Corentin also pointed out the N should be added to the list of characters that are not  conditional-escape-sequence-char <http://eel.is/c++draft/lex.literal#nt:conditional-escape-sequence-char> in [lex.conn]
>
>     conditional-escape-sequence-char: <http://eel.is/c++draft/lex.literal#nt:conditional-escape-sequence-char>
>     any member of the basic character set that is not an /octal-digit/, a /simple-escape-sequence-char/, or the characters N, u, U, or x
>