C++ Logo

sg16

Advanced search

Help request: regex for pp-number with XID_Continue

From: will wray <wjwray_at_[hidden]>
Date: Mon, 12 Dec 2022 18:15:16 -0400
Fixing an issue in the pcpp pure Python C preprocessor
I'm out of my depth in unicode regex (and preprocessing)

I'm proposing to add PP_NUMBER using the regex below to accept
pp-number with identifier-continue = digit + non-digit

'\.?\d(?:\.|[\w_]|'[\w_]|[eEpP][-+])*'

https://eel.is/c++draft/lex.ppnumber#ntref:pp-number
then the linked parsing spec for identifier-continue includes

> 'an element of the translation character set of class XID_Continue'

I guess the regex should be extended to parse XID_Continue.
Does anyone here have a clue how to do that?
(With Python re as the regex engine.)

Any other pp-token that should be considered for unicodification?
Thanks for any pointers.
--------

Bonus follow on preprocessing question (not unicode related):

For a pure preprocessor, divorced from a C or C++ compiler,
is there any need for cpp-integer and cpp-float tokens?

My PR fix entirely removes their CPP_INTEGER and CPPFLOAT
tokens, which appear to be superfluous in a pure preprocessor up to
phase 4; all tests pass and the PR fixes my issue and another issue.
The evaluator for #if conditional expressions doesn't do cpp-tokens
tokenization, which would've been the only place the CPP_ tokens
might have been needed, I think...

Received on 2022-12-12 22:15:28