sg16: Re: [SG16] D1949R4 - Unicode Identifiers

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 27 May 2020 01:24:52 -0400

Thanks, Steve. A few nit-picky comments below.

In the new "Summary" section, in addition to noting that emoji will no
longer be allowed in identifiers, I think it would be helpful to note
that identifiers previously allowed for some scripts will no longer be
allowed. This is mentioned in section 6.1, but I think also worthy of
mention in the summary.

In section 7, there is an instance of "C++. C++.".

Section 7 states that N3146 "considered using UAX31". My reading of
N3146 is that it did use UAX #31, but it adapted what was then called
the "Alternative Identifier Syntax" option. Unicode 9 renamed
"Alternative Identifier Syntax" to "Immutable Identifiers". The
relevant text from N3146 is:
>
> The set of UCNs *disallowed* in identifiers in C and C++ should
> exactly match the specification in [AltId], *with the following
> additions*: all characters in the Basic Latin (i.e. ASCII, basic
> source character) block, and all characters in the Unicode General
> Category "Separator, space".
>
[AltId] corresponds to:
>
> Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax,
> "Alternative Identifier Syntax",
> http://www.unicode.org/reports/tr31/tr31-11.html#Alternative_Identifier_Syntax
>

Section 7 also states, "The Unicode standard has since made stability
guarantees about identifiers, and created the XID_Start and XID_Continue
properties to alleviate the stability concerns that existed in 2010."
However, the Unicode 5.2 version of UAX #31 referenced by N3146 does
reference XID_Start and XID_Continue. It looks to me like the XID
properties have been around since at least 2005 and Unicode 4. Perhaps
the XID properties were not stable at that time? Regardless, it looks
like the quoted sentence needs an update.

In section 9.3, the sub-sections are arguably out of order. The first
two sub-sections are for R1 and R4 (requirements that are met), and the
remaining sub-sections list requirements that are not met (including
R1a, R1b, R2, and R3). I think the sub-section order should follow the
requirement order (R1, R1a, R1b, R2, R3, R4, ...)

In section 10, the end of the first paragraph appears to be missing an
"XID"; "... character classes XID_Start and _Continue."

In the wording for [lex.name]p1, the footnote is moved into the
paragraph, but still states "footnote" instead of "note". If this is
because Jens indicated this is how the editors expect relocation of a
footnote to be communicated, then ignore this comment.

In the wording for [lex.name]p1, the copied footnote text doesn't match
the WP. There is a missing "\u in".

In the annex wording for X.2 R1, can we avoid duplicating the grammar
specification from [lex.name]?

Tom.

On 5/26/20 4:51 PM, Steve Downey via SG16 wrote:
> Find attached a draft of the UAX31 paper for discussion.
> Viewable at
> http://htmlpreview.github.io/?https://github.com/steve-downey/papers/blob/master/generated/p1949.html
> Source at https://github.com/steve-downey/papers/blob/master/p1949.md
>
> (note that github doesn't format the same way that mpark's WG21 format
> does)
>

Received on 2020-05-27 00:28:00