Date: Thu, 18 Jun 2020 10:33:30 -0400
On 18/06/2020 10.05, Corentin Jabot wrote:
> On Thu, Jun 18, 2020, 15:52 Matthew Woehlke via SG16 wrote:
>> Okay... I have a potential strong objection to this. It is not clear to
>> me (not being a unicode expert) how this will interact with the many,
>> many existing tools (ot to mention programmer muscle memory) that
>> defines identifiers as:
>>
>> [_[:alpha:]][_[:alnum:]]*
>
> [_\p{XID_Start}]\p{XID_Continue}*
- Are those equivalent?
- Is that supported by common tools? (No.)
- Is that a royal pain to type? (Yes.)
- Are '\p{XID_Continue}' and '\w' synonyms?
>> I would very, ***VERY*** strongly like to see an analysis of whether
>> this change is going to break existing tools that rely on the above
>> definition of identifiers.
>
> These tools probably don't work as it is with the set of characters allowed
> currently. For example, emojis are currently allowed and not matched by
> your regex
> Note also that "a\u00e9" can appear verbatim in code which the regex
> wouldn't match
Okay, maybe not, but then I suppose my point is that if we're going to
fix it, I would like to *fix* it, not just make it less broken.
> On Thu, Jun 18, 2020, 15:52 Matthew Woehlke via SG16 wrote:
>> Okay... I have a potential strong objection to this. It is not clear to
>> me (not being a unicode expert) how this will interact with the many,
>> many existing tools (ot to mention programmer muscle memory) that
>> defines identifiers as:
>>
>> [_[:alpha:]][_[:alnum:]]*
>
> [_\p{XID_Start}]\p{XID_Continue}*
- Are those equivalent?
- Is that supported by common tools? (No.)
- Is that a royal pain to type? (Yes.)
- Are '\p{XID_Continue}' and '\w' synonyms?
>> I would very, ***VERY*** strongly like to see an analysis of whether
>> this change is going to break existing tools that rely on the above
>> definition of identifiers.
>
> These tools probably don't work as it is with the set of characters allowed
> currently. For example, emojis are currently allowed and not matched by
> your regex
> Note also that "a\u00e9" can appear verbatim in code which the regex
> wouldn't match
Okay, maybe not, but then I suppose my point is that if we're going to
fix it, I would like to *fix* it, not just make it less broken.
-- Matthew
Received on 2020-06-18 09:36:42