C++ Logo

sg16

Advanced search

Re: Rewording wording for named-universal-characters

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Sat, 26 Feb 2022 09:42:54 +0100
On Fri, Feb 25, 2022 at 11:33 PM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 25/02/2022 23.20, Corentin Jabot wrote:
> > Can we flip it around?
> >
> >
> > Then the named-universal-character designates the element of the
> translation character set whose UCS scalar value is equal to the code point
> of that character.
> > Otherwise, the program is ill-formed.
> > [Note: The lists of names and aliases are guaranteed to be disjoint. An
> n-char sequence will be found in at most one list. --end note]
>
> We want to avoid "matches" because it might mean "some fuzzy match"
> instead of
> equality.
>
> We want to start the paragraph with the same introducer as the preceding
> one.
>

This is challenging.
There are a lot of moving pieces.
Can we rewrite the previous paragraph too?


If the n-char-sequence of a named-universal-character is exactly equal to
either
- The name alias of a character as specified in ISO/IEC 10646 clause 34
"Character names list"
- The associated name of a character as specified in ISO/IEC 10646 clause
34 "Character names list"
- A control code alias of a character as specified in table X
Then the named-universal-character designates the code point of that
character.

A universal-character-name designates the character in the translation
character set whose UCS scalar value is:
  - For a universal-character-name of the form \u hex-quad or \U hex-quad
hex-quad, the hexadecimal number represented by the sequence of
hexadecimal-digits in the universal-character-name.
  - For a named-universal-character, the code point it designates.

If a universal-character-name does not designate a UCS scalar value, the
program is ill-formed.








>
> We want to quote the subclause title from ISO 10646 and drop :2020.
>
> Jens
>
>
> >
> > On Fri, Feb 25, 2022 at 10:50 PM Steve Downey <sdowney_at_[hidden]
> <mailto:sdowney_at_[hidden]>> wrote:
> >
> > The feedback was overall positive, however the paragraph
> >
> > A /named-universal-character/ designates the character in the
> translation character set
> >
> > * whose associated character name or character name alias as
> specified in ISO/IEC 10646:2020 clause 34 or
> > * whose control code alias in table X
> >
> > is the given /n-char-sequence/. The program is ill-formed if there
> is no such character.
> >
> > was seen as confusing. The goal is to say that we find the
> n-char-sequence in one of three lists, the 'associated character name'
> list, which is the immutable name for a code point, the 'character name
> alias' which is a short list of renamed code points where the immutable
> name is incorrect, and the list of control code aliases which do not have
> either 'associated character names' or 'character name aliases' but do
> have names specified in the unicode character database. These names are all
> guaranteed to be distinct from one another.
> > This is documented in the Unicode Standard in Section 4.8, Names. In
> terms of the Unicode Character Database, we are using the Name property of
> assigned characters combined with 'correction', 'control', and 'alternate'
> types from NameAliases.txt. ISO 10646 marks correction and alternate in
> 34.5 Code charts and lists of character names with ※, but not the control
> aliases, which is why we have the table X.
> >
> > Suggestion:
> > The character in the translation character set designated by the
> named-universal-character which has the character code point where the
> n-char-sequence matches either the associated character name or character
> name alias, as specified in ISO 10646 "Code charts and lists of character
> names", or matches the control code alias for a code point in table X. If
> no name or alias matches the program is ill-formed. [Note: The lists of
> names and aliases are guaranteed to be disjoint. An n-char sequence will be
> found in at most one list. --end note]
> >
> >
> > Corentin also pointed out the N should be added to the list of
> characters that are not conditional-escape-sequence-char <
> http://eel.is/c++draft/lex.literal#nt:conditional-escape-sequence-char> in
> [lex.conn]
> >
> > conditional-escape-sequence-char: <
> http://eel.is/c++draft/lex.literal#nt:conditional-escape-sequence-char>
> > any member of the basic character set that is not an /octal-digit/,
> a /simple-escape-sequence-char/, or the characters N, u, U, or x
> >
>
>

Received on 2022-02-26 08:43:06