C++ Logo


Advanced search

Re: [SG16] Multiple combining characters and P1949R3: C++ Identifier Syntax using Unicode Standard Annex 31

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Wed, 6 May 2020 12:00:46 +0200
On Wed, 6 May 2020 at 10:09, Jens Maurer <Jens.Maurer_at_[hidden]> wrote:

> On 06/05/2020 09.09, Corentin Jabot via SG16 wrote:
> > More generally, concatenating 2 NFC sequences is not guaranteed to
> result in an NFC sequence.[1]
> > Maybe NFC verification should be done on C++ tokens, not preprocessor
> token (because then we would have to check twice) ?
> > But I question whether spending so much time on these
> contrived examples is a valuable use of anyone's time.
> >
> > As such, making
> > #define accent(x)x##\uxxxx
> >
> > ill-formed is a course of action that I think should be entertained
> For *any* value of xxxx, including valid XID_Start characters?
> That seems non-desirable: If xxxx is in XID_Start, then \uxxxx is a valid
> single-character identifier, so the concatenation operation is totally
> fine.
> > (Afaict, concatenating 2 valid identifiers results in a valid identifier
> in all cases)
> And \uxxxx might be a valid identifier.

Let me know if that makes sense:

*identifier* (or *pp-identifier *in P1949R3) are valid NFC as per P1949R3
We know that while arbitrary concatenation if NFC sequences may not be NFC,
concatenating NFC identifiers is always NFC because XID_Start combining are
never combining.
So, from that the wording in P1949R3 seems sufficient ( while
http://wiki.edg.com/pub/Wg21summer2020/SG16/uax31.html is not )
Notably there should not be isolated universal-character-name that are not
well-formed identifiers.

(I don't think forming well formed identifiers from non-well formed bits of
identifiers is a use case we should concerned ourselves with)

> Jens

Received on 2020-05-06 05:04:52