C++ Logo

SG16

Advanced search

Subject: Re: [isocpp-ext] P1949R4 - C++ Identifier Syntax using Unicode Standard Annex 31
From: Martinho Fernandes (rmf_at_[hidden])
Date: 2020-06-18 11:52:58


On Thu, Jun 18, 2020 at 6:26 PM Matthew Woehlke via SG16 <
sg16_at_[hidden]> wrote:

> On 18/06/2020 11.23, Corentin Jabot wrote:
> > I'm puzzled by your use case. How often do you use a regex to find
> > identifiers?
>
> Often, possibly as often as "daily". Certainly when I want to apply code
> transformations.
>

I feel like there is an important distinction that needs to be made clear
here. There is a world of difference between grepping for `m_length` or
grepping for `[_[:alpha:]][:alnum:]*`. One is meant to find a specific blob
of text, and the other is meant to find text with a particular grammar
function. I use "meant" here intentionally. Consider that greeping for
`[_[:alpha:]][:alnum:]*` will not give you a list of identifiers in your
source code unless your source code has absolutely no comments. If your
source code has comments (and those comments are human-readable),
`[_[:alpha:]][:alnum:]*` will match all the words in those comments. Every
single one of them. I have serious trouble accepting that anyone needs this
functionality ever, much less daily. To me the whole endeavour feels like
the moral equivalent of the parsing HTML with regex meme. It's a losing
proposition from the very start. To locate identifiers reliably in this
manner you need a C++ parser, even if your code uses the basic source
character set exclusively.



SG16 list run by sg16-owner@lists.isocpp.org