C++ Logo

sg16

Advanced search

Re: Pattern Syntax and Whitespace

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 14 Sep 2022 21:17:54 -0400
Hopefully I'll be clearer in the draft. Our whitespace definition is
entirely disjoint from identifier characters, but a subset of the Unicode
whitespace characters. Our syntax characters are limited to the basic
source characters, and are also a subset of what Unicode allows.
My intention is to document what we do, using Unicode notation, but make no
actual changes to lexing or parsing.

In some future standard, extending whitespace ought to be reasonably
straightforward.
Extending the set we use for syntax would be fraught.

On Wed, Sep 14, 2022, 21:05 Hubert Tong <hubert.reinterpretcast_at_[hidden]>
wrote:

> On Wed, Sep 14, 2022 at 7:11 PM Steve Downey via SG16 <
> sg16_at_[hidden]> wrote:
>
>> The new Unicode 15.0 version of UAX 31 clarifies
>> https://unicode.org/reports/tr31/#Pattern_Syntax that the definitions of
>> whitespace and characters used for syntax are intended to apply to
>> programming languages. C++ does not use them, of course, nor am I
>> suggesting we should make this change now.
>>
>> For purposes of the conformance annex, E,
>> http://eel.is/c++draft/uaxid#pattern, I am thinking of extracting the
>> profile we do use from our lexer defs, and refer back to those sections as
>> being normative. Does that seem reasonable?
>>
>
> UAX-R3 claims:
> When meeting this requirement, all characters except those that have the
> Pattern_White_Space or Pattern_Syntax properties are available for use as
> identifiers or literals.
>
> Can you explain how that is true (especially for C++ identifiers) with a
> profile?
>
> Also, is it clear that our identifier grammar confers syntactic relevance
> to, for example, XID_Start characters for this profile?
>
>
>>
>> They've also clarified in a few places where the intent has always been
>> to choose from one of the alternatives, where for example Restricted Format
>> Characters R1a is ruled out if you're following R1,
>> https://unicode.org/reports/tr31/#R1 and
>> https://unicode.org/reports/tr31/#R1a , because the characters are
>> excluded from XID_Continue.
>>
>> My plan is to have a draft early next week, and it's currently on our NB
>> comment list, with a note to have changes to refer to.
>>
>> I'll also add the updates from 14 to 15 as comments. Comments are due for
>> us to INCITS on the 28th, so internal discussion is leaning towards getting
>> in anything we think should be considered, and better two comments than
>> none.
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2022-09-15 01:18:07