C++ Logo

sg16

Advanced search

Re: Rewording wording for named-universal-characters

From: Steve Downey <sdowney_at_[hidden]>
Date: Tue, 8 Mar 2022 00:09:24 -0500
I believe all the formatting infelicities are now fixed, and I've uploaded
to the wiki at
https://wiki.edg.com/pub/Wg21telecons2022/Teleconference2022-03-11/d2071r2.html
The actual paper has proper m-dash bullets for the
*universal-character-name* text, but they don't seem to have translated to
the paste into gmail below.
11 Wording

These changes are relative to [N4901
<https://wiki.edg.com/pub/Wg21telecons2022/Teleconference2022-03-11/d2071r2.html#ref-N4901>
] “Working Draft, Standard for Programming Language C++”

Modify [lex.charset]

(lex.charset.3)The universal-character-name construct provides a way to
name other characters.


              *n-char*: one of
                     A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
                     0 1 2 3 4 5 6 7 8 9
                     U+002D HYPHEN-MINUS
                     U+0020 SPACE

              *n-char-sequence*:
                     *n-char*
                     *n-char-sequence* *n-char*

              *named-universal-character*:
                     \N { *n-char-sequence* }

              *hex-quad*:
                     *hexadecimal-digit* *hexadecimal-digit*
*hexadecimal-digit* *hexadecimal-digit*

              *universal-character-name*:
                     \u *hex-quad*
                     \U *hex-quad* *hex-quad*
                     *named-universal-character*

A *universal-character-name* of the form \u *hex-quad* or \U *hex-quad*
*hex-quad* designates the character in the translation character set whose
UCS scalar value is the hexadecimal number represented by the sequence of
*hexadecimal-digits* in the *universal-character-name*. The program is
ill-formed if that number is not a UCS scalar value.

A *universal-character-name* that is a *named-universal-character* designates
the character named by its *n-char-sequence*. A character is so named if
the *n-char-sequence* is equal to

   - the associated character name or associated character name alias
   specified in ISO/IEC 10646 subclause “Code charts and lists of character
   names” or
   - the control code alias given in Table X.

[Note: The aliases in table X are provided for control characters which
otherwise have no associated character name or character name alias. These
names are derived from the Unicode Character Database’s NameAliases.txt.
For historical reasons, control characters are formally unnamed. – end note]

Table X

*Code point*
*Control Code Alias*
U+0000 NULL
U+0001 START OF HEADING
U+0002 START OF TEXT
U+0003 END OF TEXT
U+0004 END OF TRANSMISSION
U+0005 ENQUIRY
U+0006 ACKNOWLEDGE
U+0007 ALERT
U+0008 BACKSPACE
U+0009 CHARACTER TABULATION
U+0009 HORIZONTAL TABULATION
U+000A LINE FEED
U+000A NEW LINE
U+000A END OF LINE
U+000B LINE TABULATION
U+000B VERTICAL TABULATION
U+000C FORM FEED
U+000D CARRIAGE RETURN
U+000E SHIFT OUT
U+000E LOCKING-SHIFT ONE
U+000F SHIFT IN
U+000F LOCKING-SHIFT ZERO
U+0010 DATA LINK ESCAPE
U+0011 DEVICE CONTROL ONE
U+0012 DEVICE CONTROL TWO
U+0013 DEVICE CONTROL THREE
U+0014 DEVICE CONTROL FOUR
U+0015 NEGATIVE ACKNOWLEDGE
U+0016 SYNCHRONOUS IDLE
U+0017 END OF TRANSMISSION BLOCK
U+0018 CANCEL
U+0019 END OF MEDIUM
U+001A SUBSTITUTE
U+001B ESCAPE
U+001C INFORMATION SEPARATOR FOUR
U+001C FILE SEPARATOR
U+001D INFORMATION SEPARATOR THREE
U+001D GROUP SEPARATOR
U+001E INFORMATION SEPARATOR TWO
U+001E RECORD SEPARATOR
U+001F INFORMATION SEPARATOR ONE
U+001F UNIT SEPARATOR
U+007F DELETE
U+0082 BREAK PERMITTED HERE
U+0083 NO BREAK HERE
U+0084 INDEX
U+0085 NEXT LINE
U+0086 START OF SELECTED AREA
U+0087 END OF SELECTED AREA
U+0088 CHARACTER TABULATION SET
U+0088 HORIZONTAL TABULATION SET
U+0089 CHARACTER TABULATION WITH JUSTIFICATION
U+0089 HORIZONTAL TABULATION WITH JUSTIFICATION
U+008A LINE TABULATION SET
U+008A VERTICAL TABULATION SET
U+008B PARTIAL LINE FORWARD
U+008B PARTIAL LINE DOWN
U+008C PARTIAL LINE BACKWARD
U+008C PARTIAL LINE UP
U+008D REVERSE LINE FEED
U+008D REVERSE INDEX
U+008E SINGLE SHIFT TWO
U+008E SINGLE-SHIFT-2
U+008F SINGLE SHIFT THREE
U+008F SINGLE-SHIFT-3
U+0090 DEVICE CONTROL STRING
U+0091 PRIVATE USE ONE
U+0091 PRIVATE USE-1
U+0092 PRIVATE USE TWO
U+0092 PRIVATE USE-2
U+0093 SET TRANSMIT STATE
U+0094 CANCEL CHARACTER
U+0095 MESSAGE WAITING
U+0096 START OF GUARDED AREA
U+0096 START OF PROTECTED AREA
U+0097 END OF GUARDED AREA
U+0097 END OF PROTECTED AREA
U+0098 START OF STRING
U+009A SINGLE CHARACTER INTRODUCER
U+009B CONTROL SEQUENCE INTRODUCER
U+009C STRING TERMINATOR
U+009D OPERATING SYSTEM COMMAND
U+009E PRIVACY MESSAGE
U+009F APPLICATION PROGRAM COMMAND

Change in table 17 of 15.11 [cpp.predefined] paragraph 1.8
<http://eel.is/c++draft/cpp.predefined#1.8>:

*Drafting note:* the final value for the
__*cpp_named_character_escapes* feature
test macro will be selected by the project editor to reflect the date of
approval.

Table 17 — Feature-test macros [tab:cpp.predefined.ft]
Macro nameValue
[…] […]
__cpp_modules 201907L
__cpp_named_character_escapes XXXXXXL *** placeholder ***
__cpp_namespace_attributes 201411L
[…] […]


On Tue, Mar 1, 2022 at 2:48 AM Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 01/03/2022 01.04, Steve Downey wrote:
> > I'll upload a D2071 to the wiki. Jens, do you have a preference about R2
> vs R3 for it, since we did review the paper last week?
>
> Keeping R2 is fine.
>
> > Wording:
>
> Italics for named-universal-character , please.
> Italics for n-char-sequence, please.
>
> (Both are grammar non-terminals.)
>
> Bullets instead of lone hyphens in the text.
>
>
> In the note:
>
> "character name alias" should always have "associated"
> in front of it, because I guess that's the term of art.
>
>
> NameAliases.txt should be monospace font.
>
>
> Jens
>
>
>
> > Modify [lex.charset]
> >
> > (lex.charset.3)The universal-character-name construct provides a way
> to name other characters.
> >
> >
> > /n-char/: one of
> > |A B C D E F G H I J K L M N O P Q R S T U V W X Y
> Z|
> > |0 1 2 3 4 5 6 7 8 9|
> > |U+002D HYPHEN-MINUS|
> > |U+0020 SPACE|
> >
> > /n-char-sequence/:
> > /n-char/
> > /n-char-sequence/ /n-char/
> >
> > /named-universal-character/:
> > |\N| { /n-char-sequence/ }
> >
> > /hex-quad/:
> >
> /hexadecimal-digit/ /hexadecimal-digit/ /hexadecimal-digit/ /hexadecimal-digit/
> >
> > /universal-character-name/:
> > |\u| /hex-quad/
> > |\U| /hex-quad/ /hex-quad/
> > /named-universal-character/
> >
> > A /universal-character-name/ of the
> form |\u| /hex-quad/ or |\U| /hex-quad/ /hex-quad/ designates the character
> in the translation character set whose UCS scalar value is the hexadecimal
> number represented by the sequence of /hexadecimal-digits/ in
> the /universal-character-name/. The program is ill-formed if that number is
> not a UCS scalar value.
> >
> > A universal-character-name that is a named-universal-character
> designates the character named by its n-char-sequence. A character is so
> named if the n-char-sequence is equal to - the associated character name or
> associated character name alias specified in ISO/IEC 10646 subclause “Code
> charts and lists of character names” or - the control code alias given in
> Table X.
> >
> > [Note: The aliases in table X are provided for control characters
> which otherwise have no associated character name or character name alias.
> These names are derived from the Unicode Character Database’s
> NameAliases.txt. For historical reasons, control characters are formally
> unnamed. – end note]
> >
> >
> > On Sun, Feb 27, 2022 at 3:36 AM Corentin Jabot <corentinjabot_at_[hidden]
> <mailto:corentinjabot_at_[hidden]>> wrote:
> >
> >
> >
> > On Sun, Feb 27, 2022 at 9:24 AM Jens Maurer <Jens.Maurer_at_[hidden]
> <mailto:Jens.Maurer_at_[hidden]>> wrote:
> >
> >
> > Steve, please make sure to upload your fixed paper to the
> 2022-03-11 core
> > telecon wiki, under the "D" name.
> >
> > On 26/02/2022 09.42, Corentin Jabot wrote:
> > >
> > >
> > > On Fri, Feb 25, 2022 at 11:33 PM Jens Maurer <
> Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]> <mailto:
> Jens.Maurer_at_[hidden] <mailto:Jens.Maurer_at_[hidden]>>> wrote:
> > >
> > > On 25/02/2022 23.20, Corentin Jabot wrote:
> > > > Can we flip it around?
> > > >
> > > >
> > > > Then the named-universal-character designates the
> element of the translation character set whose UCS scalar value is equal to
> the code point of that character.
> > > > Otherwise, the program is ill-formed.
> > > > [Note: The lists of names and aliases are guaranteed to
> be disjoint. An n-char sequence will be found in at most one list. --end
> note]
> > >
> > > We want to avoid "matches" because it might mean "some
> fuzzy match" instead of
> > > equality.
> > >
> > > We want to start the paragraph with the same introducer as
> the preceding one.
> > >
> > >
> > > This is challenging.
> > > There are a lot of moving pieces.
> > > Can we rewrite the previous paragraph too?
> > >
> > >
> > > If the n-char-sequence of a named-universal-character is
> exactly equal to either
> > > - The name alias of a character as specified in ISO/IEC 10646
> clause 34 "Character names list"
> > > - The associated name of a character as specified in ISO/IEC
> 10646 clause 34 "Character names list"
> > > - A control code alias of a character as specified in table X
> > > Then the named-universal-character designates the code point
> of that character.
> > >
> > > A universal-character-name designates the character in the
> translation character set whose UCS scalar value is:
> > > - For a universal-character-name of the form \u hex-quad or
> \U hex-quad hex-quad, the hexadecimal number represented by the sequence of
> hexadecimal-digits in the universal-character-name.
> > > - For a named-universal-character, the code point it
> designates.
> > >
> > > If a universal-character-name does not designate a UCS scalar
> value, the program is ill-formed.
> >
> > I think part of the confusion stemmed from the fact that people
> were looking at an old
> > version of the paper, because I hadn't updated the link at the
> top of the wiki page.
> >
> > Suggestion:
> >
> > A universal-character-name that is a named-universal-character
> designates the
> > character named by its n-char-sequence. A character is so named
> if the
> > n-char-sequence is equal to
> > - the associated character name or associated character name
> alias specified in
> > ISO/IEC 10646 subclause "Code charts and lists of character
> names" or
> > - the control code alias given in Table X.
> > The program is ill-formed if there is no such character.
> >
> >
> > This is great!
> >
> >
> > Jens
> >
>
>

Received on 2022-03-08 05:09:38