C++ Logo

sg16

Advanced search

Re: [SG16] Revised and repaired P1949R2 - C++ Identifier Syntax using Unicode Standard Annex 31

From: Hubert Tong <hubert.reinterpretcast_at_[hidden]>
Date: Sun, 23 Feb 2020 12:48:20 -0500
On Sun, Feb 23, 2020 at 11:19 AM Steve Downey <sdowney_at_[hidden]> wrote:

> My internet in listing all of them in the body of the paper was to address
> each of them, not to claim conformance to each of them.
>
Being upfront on whether or not each one is met would be helpful.


> The intent in listing the syntax characters was to affirm that no new
> characters are being allowed for keywords or parts of the grammar other
> than identifiers, where universal character names are used now. If that's
> overspecifying, then yes dropping it makes sense.
>
I believe it is actively harmful to describe any profile of UAX31-R3 in the
Annex in the current context of the syntax of identifier tokens in C++
source files. Applying UAX31-R3 may be appropriate with respect to regular
expression patterns. This also means that the Annex should be titled less
broadly. It is the application of UAX #31 *to identifiers* (and not in
general).


>
>
>
> On Sun, Feb 23, 2020, 08:16 Hubert Tong via SG16 <sg16_at_[hidden]>
> wrote:
>
>> (Collecting the list back into one place; a more careful reading leads me
>> to believe LOW LINE is handled property)
>>
>> On Sat, Feb 22, 2020 at 8:09 PM Hubert Tong <
>> hubert.reinterpretcast_at_[hidden]> wrote:
>>
>>> On Sat, Feb 22, 2020 at 3:37 PM Jens Maurer via SG16 <
>>> sg16_at_[hidden]> wrote:
>>>
>>>> On 22/02/2020 19.46, Steve Downey via SG16 wrote:
>>>> > Attached are the revised P1949 paper pulling back the changes from
>>>> D1949R2 as presented and the original source. I've also fixed up formatting
>>>> problems. The markdown is the original source and is transformed using
>>>> Michael Parks's wg21 paper system. https://github.com/mpark/wg21 and
>>>> https://mpark.github.io/programming/2018/11/16/how-i-format-my-cpp-papers/
>>>> .
>>>> >
>>>> > Source for p1949 is
>>>> https://github.com/steve-downey/papers/blob/master/p1949.md
>>>> >
>>>> > As this is intended to capture the paper as presented, please just
>>>> review that it is accurate in that regard. However, I'm absolutely open to
>>>> any edits or suggestions for R3. In particular, for R3 (or 4) in Varna I'll
>>>> want to make sure the body of the paper reflects and supports the wording,
>>>> because I hate when they don't.
>>>>
>>>> Brief comments on the proposed wording for R3 or so:
>>>>
>>>> - Add periods at the end of the sentences in the new Annex.
>>>> - X.7 needs an empty line
>>>> - Add cross-references from the Annex to the main text where
>>>> the normative statement appears.
>>>> - The underline seems to be forgotten in "start" in lex.name
>>>> - diff.cpp20.lex "affected subclause" 5.10: also add section label
>>>> - When referring to ISO 10646, it would probably be helpful to
>>>> also give a section number for the rule being relied upon.
>>>>
>>> - Steve, please add subclause and paragraph numbers (alongside the
>>> stable name for the subclause) based on an identified IS working draft.
>>> - The references to ISO 10646 in the first changed paragraph appear to
>>> be wrong. There is no XID_CONTINUE class in ISO/IEC 10646:2017. The
>>> reference should be to UAX #44 (please add to the normative references).
>>> Also, the property name is "XID_Continue".
>>> - Suggestion:
>>> An identifier shall<del> conform to the</del><ins>be in</ins> NFC<del>
>>> normalization</del><ins> as</ins> specified in ISO/IEC 10646.
>>> <del>If an identifier is encountered that does not, the program is
>>> ill-formed.</del>
>>> - I suggest having a note in the paper that the NFC restriction only
>>> comes into effect for user-defined-integer-literals and
>>> user-defined-floating-point-literals at phase 7 of translation.
>>> - Was EWG informed that the proposal can be understood as introducing
>>> new core language undefined behaviour for existing programs with
>>> identifiers in NFC form where the concatenation is not in NFC form
>>> ([cpp.concat])? I note that R1 of the paper was not clear on that point and
>>> R2 does not identify it as a consideration.
>>> - Is there an intent to bring the paper to WG 14?
>>> - In the Annex D entry, the ISO 10646 references should be to UAX #31.
>>> UAX #31 should be added to the bibliography.
>>> - In the Annex D entry, do not claim that identifiers can only be
>>> written a single way. That statement is not true before phase 5 of
>>> translation.
>>> - The new Annex should be marked as being informative only.
>>> - In the new Annex, UAX #31 should be referenced as "UAX #31".
>>> - The C++ Standard uses "white-space" as an adjective modifying
>>> "characters".
>>> - Missing "s" in the introduction to the list of white-space characters.
>>> - The "minus space" should be "minus the white-space characters".
>>> - The code font tables should be formatted as real tables with headings
>>> and no extraneous fields.
>>>
>> ^ Regarding the code font tables in the new Annex: I believe they're
>> going away anyway (see below).
>>
>> - The new Annex is only partially written in the form of claiming
>> conformance to UAX #31. Of note is that not all "requirements" need to be
>> met. Additionally, the statement given in the proposed text in some cases
>> has little bearing on the requirement. UAX31-R1a is a requirement to
>> *allow* additional format characters (with restrictions). The statement
>> that there are no additional restrictions seems to imply the requirement is
>> met, but it is not.
>>
>> An example of statements indicating whether the "requirements" are met or
>> not:
>> - The UAX31-R1 requirement is met using a profile, setting
>> Start := XID_Start, plus U+005F LOW LINE
>> Continue := Start + XID_Continue
>> Medial := []
>> - The UAX31-R1a requirement is not met. U+200C ZERO WIDTH NON-JOINER and
>> U+200D ZERO WIDTH JOINER is not allowed by the profile of UAX31-R1
>> described.
>>
>> Please review for all of the requirements. In particular, UAX31-R3 is not
>> met and not meant to be met in the context of parsing C++. The statement
>> should say:
>> - The UAX31-R3 requirement is not met and is not applicable. C++ is not a
>> pattern language.
>>
>>
>>>
>>>
>>>>
>>>> Jens
>>>> --
>>>> SG16 mailing list
>>>> SG16_at_[hidden]
>>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>>
>>> --
>> SG16 mailing list
>> SG16_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>

Received on 2020-02-23 11:51:18