On Wed, Sep 14, 2022 at 10:48 PM Steve Downey <sdowney@gmail.com> wrote:


On Wed, Sep 14, 2022, 22:18 Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:
On Wed, Sep 14, 2022 at 9:19 PM Steve Downey <sdowney@gmail.com> wrote:
We said it doesn't apply for the wrong reasons. It still doesn't apply, but we should explain why. 

You mean the intent is to state that we don't meet UAX-R3 at all?
If the intent is instead to state that we meet UAX-R3-2, then I am not sure that there is enough clarity over what is meant by "syntactic use".

I think we might eventually harmonize them. 
My intent right now is to disclaim conformance, as we already do, and describe what we do, with references back to normative text.

If describing what we use for syntax is controversial, in the sense that UAX31 describes, or we don't agree on what it means, saying we're not confirming and see relevant parts of [lex] without detail would be entirely acceptable to me. 

We've done the right thing, for the wrong reason, and I'd like to replace the reason, without claiming anything more. 

Sounds good.
 


 

On Wed, Sep 14, 2022, 21:17 Steve Downey <sdowney@gmail.com> wrote:
Hopefully I'll be clearer in the draft. Our whitespace definition is entirely disjoint from identifier characters, but a subset of the Unicode whitespace characters. Our syntax characters are limited to the basic source characters, and are also a subset of what Unicode allows. 
My intention is to document what we do, using Unicode notation, but make no actual changes to lexing or parsing. 

In some future standard, extending whitespace ought to be reasonably straightforward. 
Extending the set we use for syntax would be fraught. 

On Wed, Sep 14, 2022, 21:05 Hubert Tong <hubert.reinterpretcast@gmail.com> wrote:
On Wed, Sep 14, 2022 at 7:11 PM Steve Downey via SG16 <sg16@lists.isocpp.org> wrote:
The new Unicode 15.0 version of UAX 31 clarifies https://unicode.org/reports/tr31/#Pattern_Syntax that the definitions of whitespace and characters used for syntax are intended to apply to programming languages. C++ does not use them, of course, nor am I suggesting we should make this change now. 

For purposes of the conformance annex, E, http://eel.is/c++draft/uaxid#pattern, I am thinking of extracting the profile we do use from our lexer defs, and refer back to those sections as being normative. Does that seem reasonable?

UAX-R3 claims:
When meeting this requirement, all characters except those that have the Pattern_White_Space or Pattern_Syntax properties are available for use as identifiers or literals.

Can you explain how that is true (especially for C++ identifiers) with a profile?

Also, is it clear that our identifier grammar confers syntactic relevance to, for example, XID_Start characters for this profile?
 

They've also clarified in a few places where the intent has always been to choose from one of the alternatives, where for example Restricted Format Characters R1a is ruled out if you're following R1, https://unicode.org/reports/tr31/#R1 and https://unicode.org/reports/tr31/#R1a , because the characters are excluded from XID_Continue. 

My plan is to have a draft early next week, and it's currently on our NB comment list, with a note to have changes to refer to. 

I'll also add the updates from 14 to 15 as comments. Comments are due for us to INCITS on the 28th, so internal discussion is leaning towards getting in anything we think should be considered, and better two comments than none.  
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16