P1949R3
presents the following code that, assuming I accurately captured
the discussion during the April 22nd SG16 telecon in https://github.com/sg16-unicode/sg16-meetings#april-22nd-2020,
we intend to make well-formed (it is currently ill-formed because
\u0300 doesn't match identifier and is therefore lexed as
'\' followed by 'u0300').
#define accent(x) x##\u0300 constexpr int accent(A) = 2; constexpr int gv2 = A\u0300; static_assert(gv2 == 2, "whatever");
However, the proposed wording would reject the following case
involving multiple combining characters:
#define accent(x) x##\u0300\u0327 constexpr int accent(A) = 2; constexpr int gv2 = A\u0300\u0327; static_assert(gv2 == 2, "whatever");
The rejection occurs because the proposed wording results in each universal-character-name that is not lexed as part of one of the existing preprocessing-token cases being lexed as its own preprocessing token; the attempted concatenation produces two preprocessor tokens (A\u0300 and \u0327). I don't know of a principled reason for such rejection, though it isn't clear to me what characters should be permitted to be munched together. One approach would be to introduce another new preprocessing-token category to match the proposed identifier-continue; max munch would still always prefer identifier when such a sequence is preceded by a character in XID_Start. We would still want to retain the proposed new "each universal-character-name ..." category as a way to avoid tearing of universal-character-names that name a character not in XID_Start or XID_Continue.
I'm not convinced that this scenario is worth addressing. It
strikes me as approximately as valuable as the first example.
Tom.