sg16: Re: [SG16] D1949R4 - Unicode Identifiers

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Sun, 31 May 2020 12:55:02 -0400

On Sun, May 31, 2020 at 8:53 AM Corentin Jabot via SG16 <
sg16_at_[hidden]> wrote:

> I mentioned this on slack but repeating here for the benefits of everyone
>
> The grammar of pp-number is now such that
>
This is not a recent development.
*identifier-nondigit* was already less restrictive than *nondigit* in C++20.

>
> 1\uxxxx is a pp-number
> but
> 1'\uxxxx isn't
>
> notice the single quote.
>
> for consistency, I propose to replace
>
I believe that the paper does not change much about the status quo here, so
I believe that we should not consider introducing the change as part of the
paper that is the subject of this thread.

>
> pp-number ' digit
> pp-number ' nondigit
>
> by
>
> pp-number ' identifier continue
>
> Any reason we should't do that?
>
I believe you would give some well-formed programs undefined behaviour.
#define COMMENT(...)
COMMENT(1'\u00c0')

> (An alternative would be to be more restrictive as to what constitute a pp
> number)
>
I am not sure we have a concrete technical problem here. For a more general
enforcement of "design", I think we'll need more information behind the
intent of *pp-number* and an analysis of the general directions that can be
taken, along with the consequences of each direction. In other words, if
there is good motivation for the committee to spend time on this, please
research and bring a paper.

>
> Corentin
>
>
>
>
>
>
>
>
> On Thu, 28 May 2020 at 00:06, Steve Downey via SG16 <sg16_at_[hidden]>
> wrote:
>
>> I will treat this as feedback for R5 of the paper. Barring errors, I am
>> treating r4 as if it were submitted for mailing, and will do so before
>> posting to the EWG reflector.
>>
>> On Wed, May 27, 2020, 17:43 Tom Honermann via SG16 <sg16_at_[hidden]>
>> wrote:
>>
>>> Since we just polled a revision without updates to address Alisdair's
>>> feedback below, I'd like to request that any updates done based on this
>>> feedback go into a new revision. Let's treat these comments as if they are
>>> early EWG feedback.
>>>
>>> Tom.
>>>
>>> On 5/27/20 3:45 PM, Alisdair Meredith via SG16 wrote:
>>>
>>> More non-technical feedback, none of this should affect SG16.
>>>
>>> For 6.1, so not assume a wider audience in EWG will immediately
>>> know that ZWJ is a Zero Width Joiner. A clearer title or Captializing
>>> The Term may help? Possibly state what a zero width joiner is
>>> before noting that some scripts rely on them?
>>>
>>> (Yes, we can work this out easily enough, but I aspire to papers
>>> that are simple to read without back-tracking).
>>>
>>> 8 : after explaining normalization form C, conclude the example by
>>> explaining how À is represented in this form.
>>>
>>>
>>> Is it worth calling out in the ABI compatibility note that we wound be
>>> breaking compatibility between versions of the same compiler, in
>>> order to achieve compatibility between all compilers on that platform?
>>> It is inferred, but I think might be clearer.
>>>
>>> 9.2 R3 are these the patterns that the pattern matching papers will
>>> introduce a need for? Maybe add a ‘yet’ if so.
>>>
>>> AlisdairM
>>>
>>> On May 27, 2020, at 19:32, Steve Downey via SG16 <sg16_at_[hidden]>
>>> wrote:
>>>
>>> Updated per various reviewer comments.
>>> Diff:
>>>
>>> https://github.com/steve-downey/papers/commit/152d9e145e437a0f1377bd4850d83564f2feab6b#diff-a1a983c3f3fa85cc5db932bc8b0e7638
>>>
>>> On Wed, May 27, 2020 at 2:21 PM Steve Downey <sdowney_at_[hidden]> wrote:
>>>
>>>> Adopted the changes, except for the footnote, which corresponds to how
>>>> the LaTeX is marked up, with the \footnote inline in the text. The footnote
>>>> doesn't actually move, it's the rest of the text around it.
>>>>
>>>> On Wed, May 27, 2020 at 1:24 AM Tom Honermann <tom_at_[hidden]>
>>>> wrote:
>>>>
>>>>> Thanks, Steve. A few nit-picky comments below.
>>>>>
>>>>> In the new "Summary" section, in addition to noting that emoji will no
>>>>> longer be allowed in identifiers, I think it would be helpful to note that
>>>>> identifiers previously allowed for some scripts will no longer be allowed.
>>>>> This is mentioned in section 6.1, but I think also worthy of mention in the
>>>>> summary.
>>>>>
>>>>> In section 7, there is an instance of "C++. C++.".
>>>>>
>>>>> Section 7 states that N3146 "considered using UAX31". My reading of
>>>>> N3146 is that it did use UAX #31, but it adapted what was then called the
>>>>> "Alternative Identifier Syntax" option. Unicode 9 renamed "Alternative
>>>>> Identifier Syntax" to "Immutable Identifiers". The relevant text from
>>>>> N3146 is:
>>>>>
>>>>> The set of UCNs *disallowed* in identifiers in C and C++ should
>>>>> exactly match the specification in [AltId], *with the following
>>>>> additions*: all characters in the Basic Latin (i.e. ASCII, basic
>>>>> source character) block, and all characters in the Unicode General Category
>>>>> "Separator, space".
>>>>>
>>>>> [AltId] corresponds to:
>>>>>
>>>>> Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax,
>>>>> "Alternative Identifier Syntax",
>>>>> http://www.unicode.org/reports/tr31/tr31-11.html#Alternative_Identifier_Syntax
>>>>>
>>>>>
>>>>> Section 7 also states, "The Unicode standard has since made stability
>>>>> guarantees about identifiers, and created the XID_Start and XID_Continue
>>>>> properties to alleviate the stability concerns that existed in 2010."
>>>>> However, the Unicode 5.2 version of UAX #31 referenced by N3146 does
>>>>> reference XID_Start and XID_Continue. It looks to me like the XID
>>>>> properties have been around since at least 2005 and Unicode 4. Perhaps the
>>>>> XID properties were not stable at that time? Regardless, it looks like the
>>>>> quoted sentence needs an update.
>>>>>
>>>>> In section 9.3, the sub-sections are arguably out of order. The first
>>>>> two sub-sections are for R1 and R4 (requirements that are met), and the
>>>>> remaining sub-sections list requirements that are not met (including R1a,
>>>>> R1b, R2, and R3). I think the sub-section order should follow the
>>>>> requirement order (R1, R1a, R1b, R2, R3, R4, ...)
>>>>>
>>>>> In section 10, the end of the first paragraph appears to be missing an
>>>>> "XID"; "... character classes XID_Start and _Continue."
>>>>>
>>>>> In the wording for [lex.name]p1, the footnote is moved into the
>>>>> paragraph, but still states "footnote" instead of "note". If this is
>>>>> because Jens indicated this is how the editors expect relocation of a
>>>>> footnote to be communicated, then ignore this comment.
>>>>>
>>>>> In the wording for [lex.name]p1, the copied footnote text doesn't
>>>>> match the WP. There is a missing "\u in".
>>>>>
>>>>> In the annex wording for X.2 R1, can we avoid duplicating the grammar
>>>>> specification from [lex.name]?
>>>>>
>>>>> Tom.
>>>>>
>>>>> On 5/26/20 4:51 PM, Steve Downey via SG16 wrote:
>>>>>
>>>>> Find attached a draft of the UAX31 paper for discussion.
>>>>> Viewable at
>>>>> http://htmlpreview.github.io/?https://github.com/steve-downey/papers/blob/master/generated/p1949.html
>>>>> Source at https://github.com/steve-downey/papers/blob/master/p1949.md
>>>>>
>>>>> (note that github doesn't format the same way that mpark's WG21 format
>>>>> does)
>>>>>
>>>>>
>>>>> <p1949.html>--
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>>>
>>>
>>>
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-05-31 11:58:28