On Thu, Jun 18, 2020 at 7:44 AM Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:
On 6/18/20 10:33 AM, Matthew Woehlke via Ext wrote:
On 18/06/2020 10.05, Corentin Jabot wrote:
On Thu, Jun 18, 2020, 15:52 Matthew Woehlke via SG16 wrote:
Okay... I have a potential strong objection to this. It is not clear to
me (not being a unicode expert) how this will interact with the many,
many existing tools (ot to mention programmer muscle memory) that
defines identifiers as:

    [_[:alpha:]][_[:alnum:]]*

[_\p{XID_Start}]\p{XID_Continue}*

- Are those equivalent?
No.

- Is that supported by common tools? (No.)
I don't think so.  It is supported by tools that conform to UTS#18 level C2, specifically rule RL2.7.

- Is that a royal pain to type? (Yes.)
Meh.

- Are '\p{XID_Continue}' and '\w' synonyms?
I don't believe so.

I would very, ***VERY*** strongly like to see an analysis of whether
this change is going to break existing tools that rely on the above
definition of identifiers.

These tools probably don't work as it is with the set of characters allowed
currently. For example, emojis are currently allowed and not matched by
your regex
Note also that "a\u00e9" can appear verbatim in code which the regex
wouldn't match

Okay, maybe not, but then I suppose my point is that if we're going to fix it, I would like to *fix* it, not just make it less broken.

What particular form of "*fix*" do you have in mind?


I'd like to understand what is "broken" first :-)
Escaping characters?
Or something about tools which try to naively process C++ code? i.e. are we trying to make naive tools easier?

 

Tom.

--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16