On Thu, Jun 18, 2020 at 8:24 AM Corentin Jabot <corentinjabot@gmail.com> wrote:


On Thu, 18 Jun 2020 at 17:08, Matthew Woehlke <mwoehlke.floss@gmail.com> wrote:
On 18/06/2020 10.46, JF Bastien wrote:
> On Thu, Jun 18, 2020 at 7:44 AM Tom Honermann wrote:
>> On 6/18/20 10:33 AM, Matthew Woehlke via Ext wrote:
>>> Okay, maybe not, but then I suppose my point is that if we're going to fix
>>> it, I would like to *fix* it, not just make it less broken.
>>
>> What particular form of "*fix*" do you have in mind?

I believe I already explained that. To repeat, make identifiers conform
to '[_[:alpha:]][_[:alnum:]]*'.

> I'd like to understand what is "broken" first :-)
> Escaping characters?
> Or something about tools which try to naively process C++ code? i.e. are we
> trying to make naive tools easier?

That depends on your definition of "easier". The goal isn't so much to
make it easier to write a tool correctly, but to make it so that
*existing* tools¹ are correct w.r.t. the standard.

Note that "tools" here includes humans. At least for me, the above
definition is muscle memory (and also very, very easy to type; usually
as '\w+', ignoring that this will catch stuff like '9to5' since such
false positives are rare).

The alternative is to convince every text editor, text tool² and text
processing library in existence that '\w' is '\p{XID_Continue}' and not
'[_[:alnum:]]' as it is currently defined (by, AFAIK, *everyone*).

I would challenge anyone to show me an existing tool³ which uses the
proposed definition of identifiers. I can name a good half dozen, just
off the top of my head, that use *my* proposed definition.

I'm puzzled by your use case. How often do you use a regex to find identifiers?
And which tools do that?

FWIW, you have to run the preprocessor before running the regex.


(¹ I'll assume use of a Unicode-correct definition of '[[:alnum:]]'. For
tools that get that wrong, I'm happy to label the tool "broken".)

(² *cough*grep*cough*)

(³ Given the paper, it would seem like even compilers probably don't use
the proposal, but anyway, name some non-compiler tools...)

--
Matthew