These changes are relative to [N4901] “Working Draft, Standard for Programming Language C++”
Modify [lex.charset]
(lex.charset.3)The universal-character-name construct provides a way to name other characters.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z0 1 2 3 4 5 6 7 8 9U+002D HYPHEN-MINUSU+0020 SPACE\N { n-char-sequence }\u hex-quad\U hex-quad hex-quadA universal-character-name of the form
\uhex-quad or\Uhex-quad hex-quad designates the character in the translation character set whose UCS scalar value is the hexadecimal number represented by the sequence of hexadecimal-digits in the universal-character-name. The program is ill-formed if that number is not a UCS scalar value.
A universal-character-name that is a named-universal-character designates the character named by its n-char-sequence. A character is so named if the n-char-sequence is equal to
- the associated character name or associated character name alias specified in ISO/IEC 10646 subclause “Code charts and lists of character names” or
- the control code alias given in Table X.
[Note: The aliases in table X are provided for control characters which otherwise have no associated character name or character name alias. These names are derived from the Unicode Character Database’s
NameAliases.txt. For historical reasons, control characters are formally unnamed. – end note]
Table X
Code point | Control Code Alias |
|---|---|
| U+0000 | NULL |
| U+0001 | START OF HEADING |
| U+0002 | START OF TEXT |
| U+0003 | END OF TEXT |
| U+0004 | END OF TRANSMISSION |
| U+0005 | ENQUIRY |
| U+0006 | ACKNOWLEDGE |
| U+0007 | ALERT |
| U+0008 | BACKSPACE |
| U+0009 | CHARACTER TABULATION |
| U+0009 | HORIZONTAL TABULATION |
| U+000A | LINE FEED |
| U+000A | NEW LINE |
| U+000A | END OF LINE |
| U+000B | LINE TABULATION |
| U+000B | VERTICAL TABULATION |
| U+000C | FORM FEED |
| U+000D | CARRIAGE RETURN |
| U+000E | SHIFT OUT |
| U+000E | LOCKING-SHIFT ONE |
| U+000F | SHIFT IN |
| U+000F | LOCKING-SHIFT ZERO |
| U+0010 | DATA LINK ESCAPE |
| U+0011 | DEVICE CONTROL ONE |
| U+0012 | DEVICE CONTROL TWO |
| U+0013 | DEVICE CONTROL THREE |
| U+0014 | DEVICE CONTROL FOUR |
| U+0015 | NEGATIVE ACKNOWLEDGE |
| U+0016 | SYNCHRONOUS IDLE |
| U+0017 | END OF TRANSMISSION BLOCK |
| U+0018 | CANCEL |
| U+0019 | END OF MEDIUM |
| U+001A | SUBSTITUTE |
| U+001B | ESCAPE |
| U+001C | INFORMATION SEPARATOR FOUR |
| U+001C | FILE SEPARATOR |
| U+001D | INFORMATION SEPARATOR THREE |
| U+001D | GROUP SEPARATOR |
| U+001E | INFORMATION SEPARATOR TWO |
| U+001E | RECORD SEPARATOR |
| U+001F | INFORMATION SEPARATOR ONE |
| U+001F | UNIT SEPARATOR |
| U+007F | DELETE |
| U+0082 | BREAK PERMITTED HERE |
| U+0083 | NO BREAK HERE |
| U+0084 | INDEX |
| U+0085 | NEXT LINE |
| U+0086 | START OF SELECTED AREA |
| U+0087 | END OF SELECTED AREA |
| U+0088 | CHARACTER TABULATION SET |
| U+0088 | HORIZONTAL TABULATION SET |
| U+0089 | CHARACTER TABULATION WITH JUSTIFICATION |
| U+0089 | HORIZONTAL TABULATION WITH JUSTIFICATION |
| U+008A | LINE TABULATION SET |
| U+008A | VERTICAL TABULATION SET |
| U+008B | PARTIAL LINE FORWARD |
| U+008B | PARTIAL LINE DOWN |
| U+008C | PARTIAL LINE BACKWARD |
| U+008C | PARTIAL LINE UP |
| U+008D | REVERSE LINE FEED |
| U+008D | REVERSE INDEX |
| U+008E | SINGLE SHIFT TWO |
| U+008E | SINGLE-SHIFT-2 |
| U+008F | SINGLE SHIFT THREE |
| U+008F | SINGLE-SHIFT-3 |
| U+0090 | DEVICE CONTROL STRING |
| U+0091 | PRIVATE USE ONE |
| U+0091 | PRIVATE USE-1 |
| U+0092 | PRIVATE USE TWO |
| U+0092 | PRIVATE USE-2 |
| U+0093 | SET TRANSMIT STATE |
| U+0094 | CANCEL CHARACTER |
| U+0095 | MESSAGE WAITING |
| U+0096 | START OF GUARDED AREA |
| U+0096 | START OF PROTECTED AREA |
| U+0097 | END OF GUARDED AREA |
| U+0097 | END OF PROTECTED AREA |
| U+0098 | START OF STRING |
| U+009A | SINGLE CHARACTER INTRODUCER |
| U+009B | CONTROL SEQUENCE INTRODUCER |
| U+009C | STRING TERMINATOR |
| U+009D | OPERATING SYSTEM COMMAND |
| U+009E | PRIVACY MESSAGE |
| U+009F | APPLICATION PROGRAM COMMAND |
Change in table 17 of 15.11 [cpp.predefined] paragraph 1.8:
Drafting note: the final value for the __cpp_named_character_escapes feature test macro will be selected by the project editor to reflect the date of approval.
Table 17 — Feature-test macros [tab:cpp.predefined.ft]
Macro name Value […] […] __cpp_modules 201907L __cpp_named_character_escapes XXXXXXL ** placeholder ** __cpp_namespace_attributes 201411L […] […]
On 01/03/2022 01.04, Steve Downey wrote:
> I'll upload a D2071 to the wiki. Jens, do you have a preference about R2 vs R3 for it, since we did review the paper last week?
Keeping R2 is fine.
> Wording:
Italics for named-universal-character , please.
Italics for n-char-sequence, please.
(Both are grammar non-terminals.)
Bullets instead of lone hyphens in the text.
In the note:
"character name alias" should always have "associated"
in front of it, because I guess that's the term of art.
NameAliases.txt should be monospace font.
Jens
> Modify [lex.charset]
>
> (lex.charset.3)The universal-character-name construct provides a way to name other characters.
>
>
> /n-char/: one of
> |A B C D E F G H I J K L M N O P Q R S T U V W X Y Z|
> |0 1 2 3 4 5 6 7 8 9|
> |U+002D HYPHEN-MINUS|
> |U+0020 SPACE|
>
> /n-char-sequence/:
> /n-char/
> /n-char-sequence/ /n-char/
>
> /named-universal-character/:
> |\N| { /n-char-sequence/ }
>
> /hex-quad/:
> /hexadecimal-digit/ /hexadecimal-digit/ /hexadecimal-digit/ /hexadecimal-digit/
>
> /universal-character-name/:
> |\u| /hex-quad/
> |\U| /hex-quad/ /hex-quad/
> /named-universal-character/
>
> A /universal-character-name/ of the form |\u| /hex-quad/ or |\U| /hex-quad/ /hex-quad/ designates the character in the translation character set whose UCS scalar value is the hexadecimal number represented by the sequence of /hexadecimal-digits/ in the /universal-character-name/. The program is ill-formed if that number is not a UCS scalar value.
>
> A universal-character-name that is a named-universal-character designates the character named by its n-char-sequence. A character is so named if the n-char-sequence is equal to - the associated character name or associated character name alias specified in ISO/IEC 10646 subclause “Code charts and lists of character names” or - the control code alias given in Table X.
>
> [Note: The aliases in table X are provided for control characters which otherwise have no associated character name or character name alias. These names are derived from the Unicode Character Database’s NameAliases.txt. For historical reasons, control characters are formally unnamed. – end note]
>
>
> On Sun, Feb 27, 2022 at 3:36 AM Corentin Jabot <corentinjabot@gmail.com <mailto:corentinjabot@gmail.com>> wrote:
>
>
>
> On Sun, Feb 27, 2022 at 9:24 AM Jens Maurer <Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net>> wrote:
>
>
> Steve, please make sure to upload your fixed paper to the 2022-03-11 core
> telecon wiki, under the "D" name.
>
> On 26/02/2022 09.42, Corentin Jabot wrote:
> >
> >
> > On Fri, Feb 25, 2022 at 11:33 PM Jens Maurer <Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net> <mailto:Jens.Maurer@gmx.net <mailto:Jens.Maurer@gmx.net>>> wrote:
> >
> > On 25/02/2022 23.20, Corentin Jabot wrote:
> > > Can we flip it around?
> > >
> > >
> > > Then the named-universal-character designates the element of the translation character set whose UCS scalar value is equal to the code point of that character.
> > > Otherwise, the program is ill-formed.
> > > [Note: The lists of names and aliases are guaranteed to be disjoint. An n-char sequence will be found in at most one list. --end note]
> >
> > We want to avoid "matches" because it might mean "some fuzzy match" instead of
> > equality.
> >
> > We want to start the paragraph with the same introducer as the preceding one.
> >
> >
> > This is challenging.
> > There are a lot of moving pieces.
> > Can we rewrite the previous paragraph too?
> >
> >
> > If the n-char-sequence of a named-universal-character is exactly equal to either
> > - The name alias of a character as specified in ISO/IEC 10646 clause 34 "Character names list"
> > - The associated name of a character as specified in ISO/IEC 10646 clause 34 "Character names list"
> > - A control code alias of a character as specified in table X
> > Then the named-universal-character designates the code point of that character.
> >
> > A universal-character-name designates the character in the translation character set whose UCS scalar value is:
> > - For a universal-character-name of the form \u hex-quad or \U hex-quad hex-quad, the hexadecimal number represented by the sequence of hexadecimal-digits in the universal-character-name.
> > - For a named-universal-character, the code point it designates.
> >
> > If a universal-character-name does not designate a UCS scalar value, the program is ill-formed.
>
> I think part of the confusion stemmed from the fact that people were looking at an old
> version of the paper, because I hadn't updated the link at the top of the wiki page.
>
> Suggestion:
>
> A universal-character-name that is a named-universal-character designates the
> character named by its n-char-sequence. A character is so named if the
> n-char-sequence is equal to
> - the associated character name or associated character name alias specified in
> ISO/IEC 10646 subclause "Code charts and lists of character names" or
> - the control code alias given in Table X.
> The program is ill-formed if there is no such character.
>
>
> This is great!
>
>
> Jens
>