C++ Logo

SG16

Advanced search

Subject: Re: [isocpp-ext] P1949R4 - C++ Identifier Syntax using Unicode Standard Annex 31
From: Tom Honermann (tom_at_[hidden])
Date: 2020-06-18 09:44:48


On 6/18/20 10:33 AM, Matthew Woehlke via Ext wrote:
> On 18/06/2020 10.05, Corentin Jabot wrote:
>> On Thu, Jun 18, 2020, 15:52 Matthew Woehlke via SG16 wrote:
>>> Okay... I have a potential strong objection to this. It is not clear to
>>> me (not being a unicode expert) how this will interact with the many,
>>> many existing tools (ot to mention programmer muscle memory) that
>>> defines identifiers as:
>>>
>>>     [_[:alpha:]][_[:alnum:]]*
>>
>> [_\p{XID_Start}]\p{XID_Continue}*
>
> - Are those equivalent?
No.
>
> - Is that supported by common tools? (No.)
I don't think so.  It is supported by tools that conform to UTS#18
<http://unicode.org/reports/tr18> level C2
<http://unicode.org/reports/tr18/#C2>, specifically rule RL2.7
<http://unicode.org/reports/tr18/#RL2.7>.
>
> - Is that a royal pain to type? (Yes.)
Meh.
>
> - Are '\p{XID_Continue}' and '\w' synonyms?
I don't believe so.
>
>>> I would very, ***VERY*** strongly like to see an analysis of whether
>>> this change is going to break existing tools that rely on the above
>>> definition of identifiers.
>>
>> These tools probably don't work as it is with the set of characters
>> allowed
>> currently. For example, emojis are currently allowed and not matched by
>> your regex
>> Note also that "a\u00e9" can appear verbatim in code which the regex
>> wouldn't match
>
> Okay, maybe not, but then I suppose my point is that if we're going to
> fix it, I would like to *fix* it, not just make it less broken.
>
What particular form of "*fix*" do you have in mind?

Tom.



SG16 list run by sg16-owner@lists.isocpp.org