C++ Logo

sg16

Advanced search

Re: Pattern Syntax and Whitespace

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 14 Sep 2022 22:48:39 -0400
On Wed, Sep 14, 2022, 22:18 Hubert Tong <hubert.reinterpretcast_at_[hidden]>
wrote:

> On Wed, Sep 14, 2022 at 9:19 PM Steve Downey <sdowney_at_[hidden]> wrote:
>
>> We said it doesn't apply for the wrong reasons. It still doesn't apply,
>> but we should explain why.
>>
>
> You mean the intent is to state that we don't meet UAX-R3 at all?
> If the intent is instead to state that we meet UAX-R3-2, then I am not
> sure that there is enough clarity over what is meant by "syntactic use".
>

I think we might eventually harmonize them.
My intent right now is to disclaim conformance, as we already do, and
describe what we do, with references back to normative text.

If describing what we use for syntax is controversial, in the sense that
UAX31 describes, or we don't agree on what it means, saying we're not
confirming and see relevant parts of [lex] without detail would be entirely
acceptable to me.

We've done the right thing, for the wrong reason, and I'd like to replace
the reason, without claiming anything more.



>
>>
>> On Wed, Sep 14, 2022, 21:17 Steve Downey <sdowney_at_[hidden]> wrote:
>>
>>> Hopefully I'll be clearer in the draft. Our whitespace definition is
>>> entirely disjoint from identifier characters, but a subset of the Unicode
>>> whitespace characters. Our syntax characters are limited to the basic
>>> source characters, and are also a subset of what Unicode allows.
>>> My intention is to document what we do, using Unicode notation, but make
>>> no actual changes to lexing or parsing.
>>>
>>> In some future standard, extending whitespace ought to be reasonably
>>> straightforward.
>>> Extending the set we use for syntax would be fraught.
>>>
>>> On Wed, Sep 14, 2022, 21:05 Hubert Tong <
>>> hubert.reinterpretcast_at_[hidden]> wrote:
>>>
>>>> On Wed, Sep 14, 2022 at 7:11 PM Steve Downey via SG16 <
>>>> sg16_at_[hidden]> wrote:
>>>>
>>>>> The new Unicode 15.0 version of UAX 31 clarifies
>>>>> https://unicode.org/reports/tr31/#Pattern_Syntax that the definitions
>>>>> of whitespace and characters used for syntax are intended to apply to
>>>>> programming languages. C++ does not use them, of course, nor am I
>>>>> suggesting we should make this change now.
>>>>>
>>>>> For purposes of the conformance annex, E,
>>>>> http://eel.is/c++draft/uaxid#pattern, I am thinking of extracting the
>>>>> profile we do use from our lexer defs, and refer back to those sections as
>>>>> being normative. Does that seem reasonable?
>>>>>
>>>>
>>>> UAX-R3 claims:
>>>> When meeting this requirement, all characters except those that have
>>>> the Pattern_White_Space or Pattern_Syntax properties are available for use
>>>> as identifiers or literals.
>>>>
>>>> Can you explain how that is true (especially for C++ identifiers) with
>>>> a profile?
>>>>
>>>> Also, is it clear that our identifier grammar confers syntactic
>>>> relevance to, for example, XID_Start characters for this profile?
>>>>
>>>>
>>>>>
>>>>> They've also clarified in a few places where the intent has always
>>>>> been to choose from one of the alternatives, where for example Restricted
>>>>> Format Characters R1a is ruled out if you're following R1,
>>>>> https://unicode.org/reports/tr31/#R1 and
>>>>> https://unicode.org/reports/tr31/#R1a , because the characters are
>>>>> excluded from XID_Continue.
>>>>>
>>>>> My plan is to have a draft early next week, and it's currently on our
>>>>> NB comment list, with a note to have changes to refer to.
>>>>>
>>>>> I'll also add the updates from 14 to 15 as comments. Comments are due
>>>>> for us to INCITS on the 28th, so internal discussion is leaning towards
>>>>> getting in anything we think should be considered, and better two comments
>>>>> than none.
>>>>> --
>>>>> SG16 mailing list
>>>>> SG16_at_[hidden]
>>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>>
>>>>

Received on 2022-09-15 02:48:53