C++ Logo

sg16

Advanced search

Re: Pattern Syntax and Whitespace

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Wed, 14 Sep 2022 22:18:17 -0400
On Wed, Sep 14, 2022 at 9:19 PM Steve Downey <sdowney_at_[hidden]> wrote:

> We said it doesn't apply for the wrong reasons. It still doesn't apply,
> but we should explain why.
>

You mean the intent is to state that we don't meet UAX-R3 at all?
If the intent is instead to state that we meet UAX-R3-2, then I am not sure
that there is enough clarity over what is meant by "syntactic use".


>
> On Wed, Sep 14, 2022, 21:17 Steve Downey <sdowney_at_[hidden]> wrote:
>
>> Hopefully I'll be clearer in the draft. Our whitespace definition is
>> entirely disjoint from identifier characters, but a subset of the Unicode
>> whitespace characters. Our syntax characters are limited to the basic
>> source characters, and are also a subset of what Unicode allows.
>> My intention is to document what we do, using Unicode notation, but make
>> no actual changes to lexing or parsing.
>>
>> In some future standard, extending whitespace ought to be reasonably
>> straightforward.
>> Extending the set we use for syntax would be fraught.
>>
>> On Wed, Sep 14, 2022, 21:05 Hubert Tong <hubert.reinterpretcast_at_[hidden]>
>> wrote:
>>
>>> On Wed, Sep 14, 2022 at 7:11 PM Steve Downey via SG16 <
>>> sg16_at_[hidden]> wrote:
>>>
>>>> The new Unicode 15.0 version of UAX 31 clarifies
>>>> https://unicode.org/reports/tr31/#Pattern_Syntax that the definitions
>>>> of whitespace and characters used for syntax are intended to apply to
>>>> programming languages. C++ does not use them, of course, nor am I
>>>> suggesting we should make this change now.
>>>>
>>>> For purposes of the conformance annex, E,
>>>> http://eel.is/c++draft/uaxid#pattern, I am thinking of extracting the
>>>> profile we do use from our lexer defs, and refer back to those sections as
>>>> being normative. Does that seem reasonable?
>>>>
>>>
>>> UAX-R3 claims:
>>> When meeting this requirement, all characters except those that have the
>>> Pattern_White_Space or Pattern_Syntax properties are available for use as
>>> identifiers or literals.
>>>
>>> Can you explain how that is true (especially for C++ identifiers) with a
>>> profile?
>>>
>>> Also, is it clear that our identifier grammar confers syntactic
>>> relevance to, for example, XID_Start characters for this profile?
>>>
>>>
>>>>
>>>> They've also clarified in a few places where the intent has always been
>>>> to choose from one of the alternatives, where for example Restricted Format
>>>> Characters R1a is ruled out if you're following R1,
>>>> https://unicode.org/reports/tr31/#R1 and
>>>> https://unicode.org/reports/tr31/#R1a , because the characters are
>>>> excluded from XID_Continue.
>>>>
>>>> My plan is to have a draft early next week, and it's currently on our
>>>> NB comment list, with a note to have changes to refer to.
>>>>
>>>> I'll also add the updates from 14 to 15 as comments. Comments are due
>>>> for us to INCITS on the 28th, so internal discussion is leaning towards
>>>> getting in anything we think should be considered, and better two comments
>>>> than none.
>>>> --
>>>> SG16 mailing list
>>>> SG16_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>
>>>

Received on 2022-09-15 02:18:45