On Wed, 6 May 2020 at 10:09, Jens Maurer <Jens.Maurer@gmx.net> wrote:
On 06/05/2020 09.09, Corentin Jabot via SG16 wrote:
> More generally, concatenating 2 NFC sequences is not guaranteed to result in an NFC sequence.[1]
> Maybe NFC verification should be done on C++ tokens, not preprocessor token (because then we would have to check twice) ?
> But I question whether spending so much time on these contrived examples is a valuable use of anyone's time.
>
> As such, making 
> #define accent(x)x##\uxxxx
>
> ill-formed is a course of action that I think should be entertained

For *any* value of xxxx, including valid XID_Start characters?

That seems non-desirable: If xxxx is in XID_Start, then \uxxxx is a valid
single-character identifier, so the concatenation operation is totally fine.

> (Afaict, concatenating 2 valid identifiers results in a valid identifier in all cases)

And \uxxxx might be a valid identifier.

Let me know if that makes sense:

identifier (or pp-identifier in P1949R3) are valid NFC as per P1949R3
We know that while arbitrary concatenation if NFC sequences may not be NFC, concatenating NFC identifiers is always NFC because XID_Start combining are never combining.
So, from that the wording in P1949R3 seems sufficient (  while http://wiki.edg.com/pub/Wg21summer2020/SG16/uax31.html is not )
Notably there should not be isolated universal-character-name that are not well-formed identifiers. 

(I don't think forming well formed identifiers from non-well formed bits of identifiers is a use case we should concerned ourselves with)














 

Jens