C++ Logo

SG16

Advanced search

Subject: Re: [isocpp-ext] P1949R4 - C++ Identifier Syntax using Unicode Standard Annex 31
From: Matthew Woehlke (mwoehlke.floss_at_[hidden])
Date: 2020-06-18 09:33:30


On 18/06/2020 10.05, Corentin Jabot wrote:
> On Thu, Jun 18, 2020, 15:52 Matthew Woehlke via SG16 wrote:
>> Okay... I have a potential strong objection to this. It is not clear to
>> me (not being a unicode expert) how this will interact with the many,
>> many existing tools (ot to mention programmer muscle memory) that
>> defines identifiers as:
>>
>> [_[:alpha:]][_[:alnum:]]*
>
> [_\p{XID_Start}]\p{XID_Continue}*

- Are those equivalent?

- Is that supported by common tools? (No.)

- Is that a royal pain to type? (Yes.)

- Are '\p{XID_Continue}' and '\w' synonyms?

>> I would very, ***VERY*** strongly like to see an analysis of whether
>> this change is going to break existing tools that rely on the above
>> definition of identifiers.
>
> These tools probably don't work as it is with the set of characters allowed
> currently. For example, emojis are currently allowed and not matched by
> your regex
> Note also that "a\u00e9" can appear verbatim in code which the regex
> wouldn't match

Okay, maybe not, but then I suppose my point is that if we're going to
fix it, I would like to *fix* it, not just make it less broken.

-- 
Matthew

SG16 list run by sg16-owner@lists.isocpp.org