2020-04-08

Wording for UAX #31 identifiers

Change in 5.4 [lex.pptoken] paragraphs 1-2:

        preprocessing-token :
               header-name
               import-keyword
               module-keyword
               export-keyword
               identifier pp-identifier
               pp-number
               character-literal
               user-defined-character-literal
               string-literal
               user-defined-string-literal
               preprocessing-op-or-punc
               each non-white-space character that cannot be one of the above
Each preprocessing token that is converted to a token (5.6) shall have the lexical form of a keyword, an identifier, a literal, or an operator or punctuator.
A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6. The categories of preprocessing token are: header names, placeholder tokens produced by preprocessing import and module directives (import-keyword, module-keyword, and export-keyword), preprocessing identifiers, preprocessing numbers, character literals (including user-defined character literals), string literals (including user-defined string literals), preprocessing operators and punctuators, and single non-white-space characters that do not lexically match the other preprocessing token categories. ...

Add a new section after 5.9

5.10new Preprocessing identifiers [lex.ppident]

pp-identifier:
    identifier-nondigit
    pp-identifier digit
    pp-identifier identifier-nondigit

identifier-nondigit:
    nondigit
    universal-character-name
  
nondigit: one of
        a b c d e f g h i j k l m
        n o p q r s t u v w x y z
        A B C D E F G H I J K L M
        N O P Q R S T U V W X Y Z _

digit: one of
        0 1 2 3 4 5 6 7 8 9

Preprocessing identifier tokens lexically include all identifiers (5.10 [lex.name]) and keywords (5.11 [lex.key]).

Remove the grammar from 5.10 [lex.name]; it was moved to 5.10new [lex.ppident].

Remove tables [tab:lex.name.allowed] and [tab:lex.name.disallowed].

Change in 5.10 [lex.name] paragraph 1:

An identifier is an arbitrarily long sequence of letters and digits. Each universal-character-name in an identifier shall designate a character whose encoding in ISO/IEC 10646 falls into one of the ranges specified in Table 2. The initial element shall not be a universal-character-name designating a character whose encoding falls into one of the ranges specified in Table 3. Upper- and lower-case letters are different. All characters are significant.
identifier:
      pp-identifier
A universal-character-name at the start of an identifier shall designate a character of class XID_Start; any other universal-character-name in an identifier shall designate a character of class XID_Continue (see ISO/IEC 10646 for the definition of the classes). [ Footnote: On systems in which linkers cannot accept extended characters, an encoding of the universal-character-name may be used in forming valid external identifiers. For example, some otherwise unused character or sequence of characters may be used to encode the \u in a universal-character-name. Extended characters may produce a long external identifier, but C++ does not place a translation limit on significant characters for external identifiers. ~~In C++, upper- and lower-case letters are considered different for all identifiers, including external identifiers.~~ ] An identifier shall conform to the NFC normalization specified in ISO/IEC 10646.
[ Note: Upper- and lower-case letters are considered different for all identifiers. -- end note ]
[ Note: In translation phase 4, identifier also includes those preprocessing-tokens (5.4 [lex.pptoken]) differentiated as keywords (5.11 [lex.key]) in the later translation phase 7 (5.6 [lex.token]). -- end note ]

Change in 5.11 [lex.key] paragraph 1:

keyword:
  any identifier pp-identifier listed in Table [tab:lex.key]
    import-keyword
    module-keyword
    export-keyword
The ~~identifiers~~ pp-identifiers shown in Table [tab:lex.key] are reserved for use as keywords (that is, they are unconditionally treated as keywords in phase 7) except in an attribute-token (9.12.1). ...