sg16: Re: [SG16] D1949R4 - Unicode Identifiers

From: Corentin Jabot <corentinjabot_at_[hidden]>
Date: Sun, 31 May 2020 14:53:16 +0200

I mentioned this on slack but repeating here for the benefits of everyone

The grammar of pp-number is now such that

1\uxxxx is a pp-number
but
1'\uxxxx isn't

notice the single quote.

for consistency, I propose to replace

pp-number ' digit
pp-number ' nondigit

by

pp-number ' identifier continue

Any reason we should't do that?
(An alternative would be to be more restrictive as to what constitute a pp
number)

Corentin

On Thu, 28 May 2020 at 00:06, Steve Downey via SG16 <sg16_at_[hidden]>
wrote:

> I will treat this as feedback for R5 of the paper. Barring errors, I am
> treating r4 as if it were submitted for mailing, and will do so before
> posting to the EWG reflector.
>
> On Wed, May 27, 2020, 17:43 Tom Honermann via SG16 <sg16_at_[hidden]>
> wrote:
>
>> Since we just polled a revision without updates to address Alisdair's
>> feedback below, I'd like to request that any updates done based on this
>> feedback go into a new revision. Let's treat these comments as if they are
>> early EWG feedback.
>>
>> Tom.
>>
>> On 5/27/20 3:45 PM, Alisdair Meredith via SG16 wrote:
>>
>> More non-technical feedback, none of this should affect SG16.
>>
>> For 6.1, so not assume a wider audience in EWG will immediately
>> know that ZWJ is a Zero Width Joiner. A clearer title or Captializing
>> The Term may help? Possibly state what a zero width joiner is
>> before noting that some scripts rely on them?
>>
>> (Yes, we can work this out easily enough, but I aspire to papers
>> that are simple to read without back-tracking).
>>
>> 8 : after explaining normalization form C, conclude the example by
>> explaining how À is represented in this form.
>>
>>
>> Is it worth calling out in the ABI compatibility note that we wound be
>> breaking compatibility between versions of the same compiler, in
>> order to achieve compatibility between all compilers on that platform?
>> It is inferred, but I think might be clearer.
>>
>> 9.2 R3 are these the patterns that the pattern matching papers will
>> introduce a need for? Maybe add a ‘yet’ if so.
>>
>> AlisdairM
>>
>> On May 27, 2020, at 19:32, Steve Downey via SG16 <sg16_at_[hidden]>
>> wrote:
>>
>> Updated per various reviewer comments.
>> Diff:
>>
>> https://github.com/steve-downey/papers/commit/152d9e145e437a0f1377bd4850d83564f2feab6b#diff-a1a983c3f3fa85cc5db932bc8b0e7638
>>
>> On Wed, May 27, 2020 at 2:21 PM Steve Downey <sdowney_at_[hidden]> wrote:
>>
>>> Adopted the changes, except for the footnote, which corresponds to how
>>> the LaTeX is marked up, with the \footnote inline in the text. The footnote
>>> doesn't actually move, it's the rest of the text around it.
>>>
>>> On Wed, May 27, 2020 at 1:24 AM Tom Honermann <tom_at_[hidden]> wrote:
>>>
>>>> Thanks, Steve. A few nit-picky comments below.
>>>>
>>>> In the new "Summary" section, in addition to noting that emoji will no
>>>> longer be allowed in identifiers, I think it would be helpful to note that
>>>> identifiers previously allowed for some scripts will no longer be allowed.
>>>> This is mentioned in section 6.1, but I think also worthy of mention in the
>>>> summary.
>>>>
>>>> In section 7, there is an instance of "C++. C++.".
>>>>
>>>> Section 7 states that N3146 "considered using UAX31". My reading of
>>>> N3146 is that it did use UAX #31, but it adapted what was then called the
>>>> "Alternative Identifier Syntax" option. Unicode 9 renamed "Alternative
>>>> Identifier Syntax" to "Immutable Identifiers". The relevant text from
>>>> N3146 is:
>>>>
>>>> The set of UCNs *disallowed* in identifiers in C and C++ should
>>>> exactly match the specification in [AltId], *with the following
>>>> additions*: all characters in the Basic Latin (i.e. ASCII, basic
>>>> source character) block, and all characters in the Unicode General Category
>>>> "Separator, space".
>>>>
>>>> [AltId] corresponds to:
>>>>
>>>> Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax,
>>>> "Alternative Identifier Syntax",
>>>> http://www.unicode.org/reports/tr31/tr31-11.html#Alternative_Identifier_Syntax
>>>>
>>>>
>>>> Section 7 also states, "The Unicode standard has since made stability
>>>> guarantees about identifiers, and created the XID_Start and XID_Continue
>>>> properties to alleviate the stability concerns that existed in 2010."
>>>> However, the Unicode 5.2 version of UAX #31 referenced by N3146 does
>>>> reference XID_Start and XID_Continue. It looks to me like the XID
>>>> properties have been around since at least 2005 and Unicode 4. Perhaps the
>>>> XID properties were not stable at that time? Regardless, it looks like the
>>>> quoted sentence needs an update.
>>>>
>>>> In section 9.3, the sub-sections are arguably out of order. The first
>>>> two sub-sections are for R1 and R4 (requirements that are met), and the
>>>> remaining sub-sections list requirements that are not met (including R1a,
>>>> R1b, R2, and R3). I think the sub-section order should follow the
>>>> requirement order (R1, R1a, R1b, R2, R3, R4, ...)
>>>>
>>>> In section 10, the end of the first paragraph appears to be missing an
>>>> "XID"; "... character classes XID_Start and _Continue."
>>>>
>>>> In the wording for [lex.name]p1, the footnote is moved into the
>>>> paragraph, but still states "footnote" instead of "note". If this is
>>>> because Jens indicated this is how the editors expect relocation of a
>>>> footnote to be communicated, then ignore this comment.
>>>>
>>>> In the wording for [lex.name]p1, the copied footnote text doesn't
>>>> match the WP. There is a missing "\u in".
>>>>
>>>> In the annex wording for X.2 R1, can we avoid duplicating the grammar
>>>> specification from [lex.name]?
>>>>
>>>> Tom.
>>>>
>>>> On 5/26/20 4:51 PM, Steve Downey via SG16 wrote:
>>>>
>>>> Find attached a draft of the UAX31 paper for discussion.
>>>> Viewable at
>>>> http://htmlpreview.github.io/?https://github.com/steve-downey/papers/blob/master/generated/p1949.html
>>>> Source at https://github.com/steve-downey/papers/blob/master/p1949.md
>>>>
>>>> (note that github doesn't format the same way that mpark's WG21 format
>>>> does)
>>>>
>>>>
>>>> <p1949.html>--
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>>
>>
>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-05-31 07:56:35