C++ Logo


Advanced search

Subject: Re: [isocpp-ext] P1949R4 - C++ Identifier Syntax using Unicode Standard Annex 31
From: JF Bastien (cxx_at_[hidden])
Date: 2020-06-18 10:25:42

On Thu, Jun 18, 2020 at 8:24 AM Corentin Jabot <corentinjabot_at_[hidden]>

> On Thu, 18 Jun 2020 at 17:08, Matthew Woehlke <mwoehlke.floss_at_[hidden]>
> wrote:
>> On 18/06/2020 10.46, JF Bastien wrote:
>> > On Thu, Jun 18, 2020 at 7:44 AM Tom Honermann wrote:
>> >> On 6/18/20 10:33 AM, Matthew Woehlke via Ext wrote:
>> >>> Okay, maybe not, but then I suppose my point is that if we're going
>> to fix
>> >>> it, I would like to *fix* it, not just make it less broken.
>> >>
>> >> What particular form of "*fix*" do you have in mind?
>> I believe I already explained that. To repeat, make identifiers conform
>> to '[_[:alpha:]][_[:alnum:]]*'.
>> > I'd like to understand what is "broken" first :-)
>> > Escaping characters?
>> > Or something about tools which try to naively process C++ code? i.e.
>> are we
>> > trying to make naive tools easier?
>> That depends on your definition of "easier". The goal isn't so much to
>> make it easier to write a tool correctly, but to make it so that
>> *existing* tools¹ are correct w.r.t. the standard.
>> Note that "tools" here includes humans. At least for me, the above
>> definition is muscle memory (and also very, very easy to type; usually
>> as '\w+', ignoring that this will catch stuff like '9to5' since such
>> false positives are rare).
>> The alternative is to convince every text editor, text tool² and text
>> processing library in existence that '\w' is '\p{XID_Continue}' and not
>> '[_[:alnum:]]' as it is currently defined (by, AFAIK, *everyone*).
>> I would challenge anyone to show me an existing tool³ which uses the
>> proposed definition of identifiers. I can name a good half dozen, just
>> off the top of my head, that use *my* proposed definition.
> I'm puzzled by your use case. How often do you use a regex to find
> identifiers?
> And which tools do that?

FWIW, you have to run the preprocessor before running the regex.

(¹ I'll assume use of a Unicode-correct definition of '[[:alnum:]]'. For
>> tools that get that wrong, I'm happy to label the tool "broken".)
>> (² *cough*grep*cough*)
>> (³ Given the paper, it would seem like even compilers probably don't use
>> the proposal, but anyway, name some non-compiler tools...)
>> --
>> Matthew

SG16 list run by sg16-owner@lists.isocpp.org