2020-04-10

Wording for UAX #31 identifiers

Add an entry in clause 2 [intro.refs]:

The Unicode Consortium. Unicode Standard Annex, UAX #44, Unicode Character Database [online]. Edited by TODO author. Revision XX; issued for Unicode 12.0.0. 2019-02-15 [viewed 2020-02-23]. Available at http://www.unicode.org/reports/tr44/tr44-XX.html TODO XXX

Change in 5.4 [lex.pptoken] paragraphs 1-2:

        preprocessing-token :
               header-name
               import-keyword
               module-keyword
               export-keyword
               identifier pp-identifier
               pp-number
               character-literal
               user-defined-character-literal
               string-literal
               user-defined-string-literal
               preprocessing-op-or-punc
               each non-white-space character that cannot be one of the above
Each preprocessing token that is converted to a token (5.6) shall have the lexical form of a keyword, an identifier, a literal, or an operator or punctuator.
A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6. The categories of preprocessing token are: header names, placeholder tokens produced by preprocessing import and module directives (import-keyword, module-keyword, and export-keyword), preprocessing identifiers, preprocessing numbers, character literals (including user-defined character literals), string literals (including user-defined string literals), preprocessing operators and punctuators, and single non-white-space characters that do not lexically match the other preprocessing token categories. ...

Add a new section after 5.9

5.10new Preprocessing identifiers [lex.ppident]

pp-identifier:
    identifier-nondigit
    pp-identifier digit
    pp-identifier identifier-nondigit

identifier-nondigit:
    nondigit
    universal-character-name
  
nondigit: one of
        a b c d e f g h i j k l m
        n o p q r s t u v w x y z
        A B C D E F G H I J K L M
        N O P Q R S T U V W X Y Z _

digit: one of
        0 1 2 3 4 5 6 7 8 9

Preprocessing identifier tokens lexically include all identifiers (5.10 [lex.name]) and keywords (5.11 [lex.key]).

Remove the grammar from 5.10 [lex.name]; it was moved to 5.10new [lex.ppident].

Remove tables [tab:lex.name.allowed] and [tab:lex.name.disallowed].

Change in 5.10 [lex.name] paragraph 1:

An identifier is an arbitrarily long sequence of letters and digits. Each universal-character-name in an identifier shall designate a character whose encoding in ISO/IEC 10646 falls into one of the ranges specified in Table 2. The initial element shall not be a universal-character-name designating a character whose encoding falls into one of the ranges specified in Table 3. Upper- and lower-case letters are different. All characters are significant.
identifier:
      pp-identifier
A universal-character-name at the start of an identifier shall designate a character of class XID_Start; any other universal-character-name in an identifier shall designate a character of class XID_Continue (see UAX #44 for the definition of the classes). [ Footnote: On systems in which linkers cannot accept extended characters, an encoding of the universal-character-name may be used in forming valid external identifiers. For example, some otherwise unused character or sequence of characters may be used to encode the \u in a universal-character-name. Extended characters may produce a long external identifier, but C++ does not place a translation limit on significant characters for external identifiers. ~~In C++, upper- and lower-case letters are considered different for all identifiers, including external identifiers.~~ ] An identifier shall conform to the NFC normalization specified in ISO/IEC 10646.
[ Note: Upper- and lower-case letters are considered different for all identifiers. -- end note ]
[ Note: In translation phase 4, identifier also includes those preprocessing-tokens (5.4 [lex.pptoken]) differentiated as keywords (5.11 [lex.key]) in the later translation phase 7 (5.6 [lex.token]). -- end note ]

Change in 5.11 [lex.key] paragraph 1:

keyword:
  any identifier pp-identifier listed in Table [tab:lex.key]
    import-keyword
    module-keyword
    export-keyword
The ~~identifiers~~ pp-identifiers shown in Table [tab:lex.key] are reserved for use as keywords (that is, they are unconditionally treated as keywords in phase 7) except in an attribute-token (9.12.1). ...

Change in 15.6 [cpp.replace] paragraph 10:

A preprocessing directive of the form
  # define identifier replacement-list new-line
defines an object-like macro that causes each subsequent instance of the macro name [ Footnote: ... ] to be replaced by the replacement list of preprocessing tokens that constitute the remainder of the directive. [ Footnote: ... ] [ Note: A pp-identifier that is not in NFC normalization form is not an identifier and thus is never the instance of a macro name. ] The replacement list is then rescanned for more macro names as specified below.

Change in 15.6 [cpp.replace] paragraph 12:

... Each subsequent instance of the function-like macro name followed by a ( as the next preprocessing token introduces the sequence of preprocessing tokens that is replaced by the replacement list in the definition (an invocation of the macro). The replaced sequence of preprocessing tokens is terminated by the matching ) preprocessing token, skipping intervening matched pairs of left and right parenthesis preprocessing tokens. Within the sequence of preprocessing tokens making up an invocation of a function-like macro, new-line is considered a normal white-space character. [ Note: A pp-identifier that is not in NFC normalization form is not an identifier and thus is never the instance of a macro name. ]

Add after 15.6.1 [cpp.subst] paragraph 2:

An identifier __VA_ARGS__ that occurs in the replacement list shall be treated as if it were a parameter, and the variable arguments shall form the preprocessing tokens used to replace it.
[ Note: A pp-identifier that is not in NFC normalization form is not an identifier and thus never names a parameter. ]

Add to the bibliography:

The Unicode Consortium. Unicode Standard Annex, UAX #31, TODO title [online]. Edited by TODO author. Revision xx; issued for Unicode 12.0.0. 2019-02-15 [viewed 2020-02-23]. Available at http://www.unicode.org/reports/tr31/tr31-35.html