C++ Logo


Advanced search

Subject: Re: [isocpp-ext] P1949R4 - C++ Identifier Syntax using Unicode Standard Annex 31
From: Steve Downey (sdowney_at_[hidden])
Date: 2020-06-18 11:35:38

If you run the preprocessor and preserve the universal-character-name
conversion all the characters you see will be in the basic source set, and
non-identifiers will match the regex. Direct unicode source code is
becoming far more portable. Regex engines that don't support unicode
classifications are going to become less useful.

On Thu, Jun 18, 2020 at 11:26 AM JF Bastien via Ext <ext_at_[hidden]>

> On Thu, Jun 18, 2020 at 8:24 AM Corentin Jabot <corentinjabot_at_[hidden]>
> wrote:
>> On Thu, 18 Jun 2020 at 17:08, Matthew Woehlke <mwoehlke.floss_at_[hidden]>
>> wrote:
>>> On 18/06/2020 10.46, JF Bastien wrote:
>>> > On Thu, Jun 18, 2020 at 7:44 AM Tom Honermann wrote:
>>> >> On 6/18/20 10:33 AM, Matthew Woehlke via Ext wrote:
>>> >>> Okay, maybe not, but then I suppose my point is that if we're going
>>> to fix
>>> >>> it, I would like to *fix* it, not just make it less broken.
>>> >>
>>> >> What particular form of "*fix*" do you have in mind?
>>> I believe I already explained that. To repeat, make identifiers conform
>>> to '[_[:alpha:]][_[:alnum:]]*'.
>>> > I'd like to understand what is "broken" first :-)
>>> > Escaping characters?
>>> > Or something about tools which try to naively process C++ code? i.e.
>>> are we
>>> > trying to make naive tools easier?
>>> That depends on your definition of "easier". The goal isn't so much to
>>> make it easier to write a tool correctly, but to make it so that
>>> *existing* tools¹ are correct w.r.t. the standard.
>>> Note that "tools" here includes humans. At least for me, the above
>>> definition is muscle memory (and also very, very easy to type; usually
>>> as '\w+', ignoring that this will catch stuff like '9to5' since such
>>> false positives are rare).
>>> The alternative is to convince every text editor, text tool² and text
>>> processing library in existence that '\w' is '\p{XID_Continue}' and not
>>> '[_[:alnum:]]' as it is currently defined (by, AFAIK, *everyone*).
>>> I would challenge anyone to show me an existing tool³ which uses the
>>> proposed definition of identifiers. I can name a good half dozen, just
>>> off the top of my head, that use *my* proposed definition.
>> I'm puzzled by your use case. How often do you use a regex to find
>> identifiers?
>> And which tools do that?
> FWIW, you have to run the preprocessor before running the regex.
> (¹ I'll assume use of a Unicode-correct definition of '[[:alnum:]]'. For
>>> tools that get that wrong, I'm happy to label the tool "broken".)
>>> (² *cough*grep*cough*)
>>> (³ Given the paper, it would seem like even compilers probably don't use
>>> the proposal, but anyway, name some non-compiler tools...)
>>> --
>>> Matthew
>> _______________________________________________
> Ext mailing list
> Ext_at_[hidden]
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/ext
> Link to this post: http://lists.isocpp.org/ext/2020/06/14268.php

SG16 list run by sg16-owner@lists.isocpp.org