P1949R3 presents the following code that, assuming I accurately captured the discussion during the April 22nd SG16 telecon in https://github.com/sg16-unicode/sg16-meetings#april-22nd-2020, we intend to make well-formed (it is currently ill-formed because \u0300 doesn't match identifier and is therefore lexed as '\' followed by 'u0300').

#define accent(x) x##\u0300

constexpr int accent(A) = 2;
constexpr int gv2 = A\u0300;
static_assert(gv2 == 2, "whatever");

However, the proposed wording would reject the following case involving multiple combining characters:

#define accent(x) x##\u0300\u0327

constexpr int accent(A) = 2;
constexpr int gv2 = A\u0300\u0327;
static_assert(gv2 == 2, "whatever");

The rejection occurs because the proposed wording results in each universal-character-name that is not lexed as part of one of the existing preprocessing-token cases being lexed as its own preprocessing token; the attempted concatenation produces two preprocessor tokens (A\u0300 and \u0327). I don't know of a principled reason for such rejection, though it isn't clear to me what characters should be permitted to be munched together. One approach would be to introduce another new preprocessing-token category to match the proposed identifier-continue; max munch would still always prefer identifier when such a sequence is preceded by a character in XID_Start. We would still want to retain the proposed new "each universal-character-name ..." category as a way to avoid tearing of universal-character-names that name a character not in XID_Start or XID_Continue.

I'm not convinced that this scenario is worth addressing. It strikes me as approximately as valuable as the first example.

Tom.