Date: Thu, 18 Jun 2020 07:46:42 -0700
On Thu, Jun 18, 2020 at 7:44 AM Tom Honermann via SG16 <
sg16_at_[hidden]> wrote:
> On 6/18/20 10:33 AM, Matthew Woehlke via Ext wrote:
>
> On 18/06/2020 10.05, Corentin Jabot wrote:
>
> On Thu, Jun 18, 2020, 15:52 Matthew Woehlke via SG16 wrote:
>
> Okay... I have a potential strong objection to this. It is not clear to
> me (not being a unicode expert) how this will interact with the many,
> many existing tools (ot to mention programmer muscle memory) that
> defines identifiers as:
>
> [_[:alpha:]][_[:alnum:]]*
>
>
> [_\p{XID_Start}]\p{XID_Continue}*
>
>
> - Are those equivalent?
>
> No.
>
>
> - Is that supported by common tools? (No.)
>
> I don't think so. It is supported by tools that conform to UTS#18
> <http://unicode.org/reports/tr18> level C2
> <http://unicode.org/reports/tr18/#C2>, specifically rule RL2.7
> <http://unicode.org/reports/tr18/#RL2.7>.
>
>
> - Is that a royal pain to type? (Yes.)
>
> Meh.
>
>
> - Are '\p{XID_Continue}' and '\w' synonyms?
>
> I don't believe so.
>
>
> I would very, ***VERY*** strongly like to see an analysis of whether
> this change is going to break existing tools that rely on the above
> definition of identifiers.
>
>
> These tools probably don't work as it is with the set of characters
> allowed
> currently. For example, emojis are currently allowed and not matched by
> your regex
> Note also that "a\u00e9" can appear verbatim in code which the regex
> wouldn't match
>
>
> Okay, maybe not, but then I suppose my point is that if we're going to fix
> it, I would like to *fix* it, not just make it less broken.
>
> What particular form of "*fix*" do you have in mind?
>
I'd like to understand what is "broken" first :-)
Escaping characters?
Or something about tools which try to naively process C++ code? i.e. are we
trying to make naive tools easier?
> Tom.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
sg16_at_[hidden]> wrote:
> On 6/18/20 10:33 AM, Matthew Woehlke via Ext wrote:
>
> On 18/06/2020 10.05, Corentin Jabot wrote:
>
> On Thu, Jun 18, 2020, 15:52 Matthew Woehlke via SG16 wrote:
>
> Okay... I have a potential strong objection to this. It is not clear to
> me (not being a unicode expert) how this will interact with the many,
> many existing tools (ot to mention programmer muscle memory) that
> defines identifiers as:
>
> [_[:alpha:]][_[:alnum:]]*
>
>
> [_\p{XID_Start}]\p{XID_Continue}*
>
>
> - Are those equivalent?
>
> No.
>
>
> - Is that supported by common tools? (No.)
>
> I don't think so. It is supported by tools that conform to UTS#18
> <http://unicode.org/reports/tr18> level C2
> <http://unicode.org/reports/tr18/#C2>, specifically rule RL2.7
> <http://unicode.org/reports/tr18/#RL2.7>.
>
>
> - Is that a royal pain to type? (Yes.)
>
> Meh.
>
>
> - Are '\p{XID_Continue}' and '\w' synonyms?
>
> I don't believe so.
>
>
> I would very, ***VERY*** strongly like to see an analysis of whether
> this change is going to break existing tools that rely on the above
> definition of identifiers.
>
>
> These tools probably don't work as it is with the set of characters
> allowed
> currently. For example, emojis are currently allowed and not matched by
> your regex
> Note also that "a\u00e9" can appear verbatim in code which the regex
> wouldn't match
>
>
> Okay, maybe not, but then I suppose my point is that if we're going to fix
> it, I would like to *fix* it, not just make it less broken.
>
> What particular form of "*fix*" do you have in mind?
>
I'd like to understand what is "broken" first :-)
Escaping characters?
Or something about tools which try to naively process C++ code? i.e. are we
trying to make naive tools easier?
> Tom.
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>
Received on 2020-06-18 09:50:09