Date: Wed, 14 Sep 2022 21:19:33 -0400
We said it doesn't apply for the wrong reasons. It still doesn't apply, but
we should explain why.
On Wed, Sep 14, 2022, 21:17 Steve Downey <sdowney_at_[hidden]> wrote:
> Hopefully I'll be clearer in the draft. Our whitespace definition is
> entirely disjoint from identifier characters, but a subset of the Unicode
> whitespace characters. Our syntax characters are limited to the basic
> source characters, and are also a subset of what Unicode allows.
> My intention is to document what we do, using Unicode notation, but make
> no actual changes to lexing or parsing.
>
> In some future standard, extending whitespace ought to be reasonably
> straightforward.
> Extending the set we use for syntax would be fraught.
>
> On Wed, Sep 14, 2022, 21:05 Hubert Tong <hubert.reinterpretcast_at_[hidden]>
> wrote:
>
>> On Wed, Sep 14, 2022 at 7:11 PM Steve Downey via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>> The new Unicode 15.0 version of UAX 31 clarifies
>>> https://unicode.org/reports/tr31/#Pattern_Syntax that the definitions
>>> of whitespace and characters used for syntax are intended to apply to
>>> programming languages. C++ does not use them, of course, nor am I
>>> suggesting we should make this change now.
>>>
>>> For purposes of the conformance annex, E,
>>> http://eel.is/c++draft/uaxid#pattern, I am thinking of extracting the
>>> profile we do use from our lexer defs, and refer back to those sections as
>>> being normative. Does that seem reasonable?
>>>
>>
>> UAX-R3 claims:
>> When meeting this requirement, all characters except those that have the
>> Pattern_White_Space or Pattern_Syntax properties are available for use as
>> identifiers or literals.
>>
>> Can you explain how that is true (especially for C++ identifiers) with a
>> profile?
>>
>> Also, is it clear that our identifier grammar confers syntactic relevance
>> to, for example, XID_Start characters for this profile?
>>
>>
>>>
>>> They've also clarified in a few places where the intent has always been
>>> to choose from one of the alternatives, where for example Restricted Format
>>> Characters R1a is ruled out if you're following R1,
>>> https://unicode.org/reports/tr31/#R1 and
>>> https://unicode.org/reports/tr31/#R1a , because the characters are
>>> excluded from XID_Continue.
>>>
>>> My plan is to have a draft early next week, and it's currently on our NB
>>> comment list, with a note to have changes to refer to.
>>>
>>> I'll also add the updates from 14 to 15 as comments. Comments are due
>>> for us to INCITS on the 28th, so internal discussion is leaning towards
>>> getting in anything we think should be considered, and better two comments
>>> than none.
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>>
we should explain why.
On Wed, Sep 14, 2022, 21:17 Steve Downey <sdowney_at_[hidden]> wrote:
> Hopefully I'll be clearer in the draft. Our whitespace definition is
> entirely disjoint from identifier characters, but a subset of the Unicode
> whitespace characters. Our syntax characters are limited to the basic
> source characters, and are also a subset of what Unicode allows.
> My intention is to document what we do, using Unicode notation, but make
> no actual changes to lexing or parsing.
>
> In some future standard, extending whitespace ought to be reasonably
> straightforward.
> Extending the set we use for syntax would be fraught.
>
> On Wed, Sep 14, 2022, 21:05 Hubert Tong <hubert.reinterpretcast_at_[hidden]>
> wrote:
>
>> On Wed, Sep 14, 2022 at 7:11 PM Steve Downey via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>> The new Unicode 15.0 version of UAX 31 clarifies
>>> https://unicode.org/reports/tr31/#Pattern_Syntax that the definitions
>>> of whitespace and characters used for syntax are intended to apply to
>>> programming languages. C++ does not use them, of course, nor am I
>>> suggesting we should make this change now.
>>>
>>> For purposes of the conformance annex, E,
>>> http://eel.is/c++draft/uaxid#pattern, I am thinking of extracting the
>>> profile we do use from our lexer defs, and refer back to those sections as
>>> being normative. Does that seem reasonable?
>>>
>>
>> UAX-R3 claims:
>> When meeting this requirement, all characters except those that have the
>> Pattern_White_Space or Pattern_Syntax properties are available for use as
>> identifiers or literals.
>>
>> Can you explain how that is true (especially for C++ identifiers) with a
>> profile?
>>
>> Also, is it clear that our identifier grammar confers syntactic relevance
>> to, for example, XID_Start characters for this profile?
>>
>>
>>>
>>> They've also clarified in a few places where the intent has always been
>>> to choose from one of the alternatives, where for example Restricted Format
>>> Characters R1a is ruled out if you're following R1,
>>> https://unicode.org/reports/tr31/#R1 and
>>> https://unicode.org/reports/tr31/#R1a , because the characters are
>>> excluded from XID_Continue.
>>>
>>> My plan is to have a draft early next week, and it's currently on our NB
>>> comment list, with a note to have changes to refer to.
>>>
>>> I'll also add the updates from 14 to 15 as comments. Comments are due
>>> for us to INCITS on the 28th, so internal discussion is leaning towards
>>> getting in anything we think should be considered, and better two comments
>>> than none.
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>>
Received on 2022-09-15 01:19:46