C++ Logo

sg16

Advanced search

Re: [SG16] D1949R4 - Unicode Identifiers

From: Steve Downey <sdowney_at_[hidden]>
Date: Wed, 27 May 2020 14:32:17 -0400
Updated per various reviewer comments.
Diff:
https://github.com/steve-downey/papers/commit/152d9e145e437a0f1377bd4850d83564f2feab6b#diff-a1a983c3f3fa85cc5db932bc8b0e7638

On Wed, May 27, 2020 at 2:21 PM Steve Downey <sdowney_at_[hidden]> wrote:

> Adopted the changes, except for the footnote, which corresponds to how the
> LaTeX is marked up, with the \footnote inline in the text. The footnote
> doesn't actually move, it's the rest of the text around it.
>
> On Wed, May 27, 2020 at 1:24 AM Tom Honermann <tom_at_[hidden]> wrote:
>
>> Thanks, Steve. A few nit-picky comments below.
>>
>> In the new "Summary" section, in addition to noting that emoji will no
>> longer be allowed in identifiers, I think it would be helpful to note that
>> identifiers previously allowed for some scripts will no longer be allowed.
>> This is mentioned in section 6.1, but I think also worthy of mention in the
>> summary.
>>
>> In section 7, there is an instance of "C++. C++.".
>>
>> Section 7 states that N3146 "considered using UAX31". My reading of
>> N3146 is that it did use UAX #31, but it adapted what was then called the
>> "Alternative Identifier Syntax" option. Unicode 9 renamed "Alternative
>> Identifier Syntax" to "Immutable Identifiers". The relevant text from
>> N3146 is:
>>
>> The set of UCNs *disallowed* in identifiers in C and C++ should exactly
>> match the specification in [AltId], *with the following additions*: all
>> characters in the Basic Latin (i.e. ASCII, basic source character) block,
>> and all characters in the Unicode General Category "Separator, space".
>>
>> [AltId] corresponds to:
>>
>> Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax,
>> "Alternative Identifier Syntax",
>> http://www.unicode.org/reports/tr31/tr31-11.html#Alternative_Identifier_Syntax
>>
>>
>> Section 7 also states, "The Unicode standard has since made stability
>> guarantees about identifiers, and created the XID_Start and XID_Continue
>> properties to alleviate the stability concerns that existed in 2010."
>> However, the Unicode 5.2 version of UAX #31 referenced by N3146 does
>> reference XID_Start and XID_Continue. It looks to me like the XID
>> properties have been around since at least 2005 and Unicode 4. Perhaps the
>> XID properties were not stable at that time? Regardless, it looks like the
>> quoted sentence needs an update.
>>
>> In section 9.3, the sub-sections are arguably out of order. The first
>> two sub-sections are for R1 and R4 (requirements that are met), and the
>> remaining sub-sections list requirements that are not met (including R1a,
>> R1b, R2, and R3). I think the sub-section order should follow the
>> requirement order (R1, R1a, R1b, R2, R3, R4, ...)
>>
>> In section 10, the end of the first paragraph appears to be missing an
>> "XID"; "... character classes XID_Start and _Continue."
>>
>> In the wording for [lex.name]p1, the footnote is moved into the
>> paragraph, but still states "footnote" instead of "note". If this is
>> because Jens indicated this is how the editors expect relocation of a
>> footnote to be communicated, then ignore this comment.
>>
>> In the wording for [lex.name]p1, the copied footnote text doesn't match
>> the WP. There is a missing "\u in".
>>
>> In the annex wording for X.2 R1, can we avoid duplicating the grammar
>> specification from [lex.name]?
>>
>> Tom.
>>
>> On 5/26/20 4:51 PM, Steve Downey via SG16 wrote:
>>
>> Find attached a draft of the UAX31 paper for discussion.
>> Viewable at
>> http://htmlpreview.github.io/?https://github.com/steve-downey/papers/blob/master/generated/p1949.html
>> Source at https://github.com/steve-downey/papers/blob/master/p1949.md
>>
>> (note that github doesn't format the same way that mpark's WG21 format
>> does)
>>
>>
>>

Received on 2020-05-27 13:35:47