C++ Logo

sg16

Advanced search

Re: [SG16] Revised and repaired P1949R2 - C++ Identifier Syntax using Unicode Standard Annex 31

From: Steve Downey <sdowney_at_[hidden]>
Date: Sun, 23 Feb 2020 11:19:46 -0500
My internet in listing all of them in the body of the paper was to address
each of them, not to claim conformance to each of them. The intent in
listing the syntax characters was to affirm that no new characters are
being allowed for keywords or parts of the grammar other than identifiers,
where universal character names are used now. If that's overspecifying,
then yes dropping it makes sense.



On Sun, Feb 23, 2020, 08:16 Hubert Tong via SG16 <sg16_at_[hidden]>
wrote:

> (Collecting the list back into one place; a more careful reading leads me
> to believe LOW LINE is handled property)
>
> On Sat, Feb 22, 2020 at 8:09 PM Hubert Tong <
> hubert.reinterpretcast_at_[hidden]> wrote:
>
>> On Sat, Feb 22, 2020 at 3:37 PM Jens Maurer via SG16 <
>> sg16_at_[hidden]> wrote:
>>
>>> On 22/02/2020 19.46, Steve Downey via SG16 wrote:
>>> > Attached are the revised P1949 paper pulling back the changes from
>>> D1949R2 as presented and the original source. I've also fixed up formatting
>>> problems. The markdown is the original source and is transformed using
>>> Michael Parks's wg21 paper system. https://github.com/mpark/wg21 and
>>> https://mpark.github.io/programming/2018/11/16/how-i-format-my-cpp-papers/
>>> .
>>> >
>>> > Source for p1949 is
>>> https://github.com/steve-downey/papers/blob/master/p1949.md
>>> >
>>> > As this is intended to capture the paper as presented, please just
>>> review that it is accurate in that regard. However, I'm absolutely open to
>>> any edits or suggestions for R3. In particular, for R3 (or 4) in Varna I'll
>>> want to make sure the body of the paper reflects and supports the wording,
>>> because I hate when they don't.
>>>
>>> Brief comments on the proposed wording for R3 or so:
>>>
>>> - Add periods at the end of the sentences in the new Annex.
>>> - X.7 needs an empty line
>>> - Add cross-references from the Annex to the main text where
>>> the normative statement appears.
>>> - The underline seems to be forgotten in "start" in lex.name
>>> - diff.cpp20.lex "affected subclause" 5.10: also add section label
>>> - When referring to ISO 10646, it would probably be helpful to
>>> also give a section number for the rule being relied upon.
>>>
>> - Steve, please add subclause and paragraph numbers (alongside the stable
>> name for the subclause) based on an identified IS working draft.
>> - The references to ISO 10646 in the first changed paragraph appear to be
>> wrong. There is no XID_CONTINUE class in ISO/IEC 10646:2017. The reference
>> should be to UAX #44 (please add to the normative references). Also, the
>> property name is "XID_Continue".
>> - Suggestion:
>> An identifier shall<del> conform to the</del><ins>be in</ins> NFC<del>
>> normalization</del><ins> as</ins> specified in ISO/IEC 10646.
>> <del>If an identifier is encountered that does not, the program is
>> ill-formed.</del>
>> - I suggest having a note in the paper that the NFC restriction only
>> comes into effect for user-defined-integer-literals and
>> user-defined-floating-point-literals at phase 7 of translation.
>> - Was EWG informed that the proposal can be understood as introducing new
>> core language undefined behaviour for existing programs with identifiers in
>> NFC form where the concatenation is not in NFC form ([cpp.concat])? I note
>> that R1 of the paper was not clear on that point and R2 does not identify
>> it as a consideration.
>> - Is there an intent to bring the paper to WG 14?
>> - In the Annex D entry, the ISO 10646 references should be to UAX #31.
>> UAX #31 should be added to the bibliography.
>> - In the Annex D entry, do not claim that identifiers can only be written
>> a single way. That statement is not true before phase 5 of translation.
>> - The new Annex should be marked as being informative only.
>> - In the new Annex, UAX #31 should be referenced as "UAX #31".
>> - The C++ Standard uses "white-space" as an adjective modifying
>> "characters".
>> - Missing "s" in the introduction to the list of white-space characters.
>> - The "minus space" should be "minus the white-space characters".
>> - The code font tables should be formatted as real tables with headings
>> and no extraneous fields.
>>
> ^ Regarding the code font tables in the new Annex: I believe they're going
> away anyway (see below).
>
> - The new Annex is only partially written in the form of claiming
> conformance to UAX #31. Of note is that not all "requirements" need to be
> met. Additionally, the statement given in the proposed text in some cases
> has little bearing on the requirement. UAX31-R1a is a requirement to
> *allow* additional format characters (with restrictions). The statement
> that there are no additional restrictions seems to imply the requirement is
> met, but it is not.
>
> An example of statements indicating whether the "requirements" are met or
> not:
> - The UAX31-R1 requirement is met using a profile, setting
> Start := XID_Start, plus U+005F LOW LINE
> Continue := Start + XID_Continue
> Medial := []
> - The UAX31-R1a requirement is not met. U+200C ZERO WIDTH NON-JOINER and
> U+200D ZERO WIDTH JOINER is not allowed by the profile of UAX31-R1
> described.
>
> Please review for all of the requirements. In particular, UAX31-R3 is not
> met and not meant to be met in the context of parsing C++. The statement
> should say:
> - The UAX31-R3 requirement is not met and is not applicable. C++ is not a
> pattern language.
>
>
>>
>>
>>>
>>> Jens
>>> --
>>> SG16 mailing list
>>> SG16_at_[hidden]
>>> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>>
>> --
> SG16 mailing list
> SG16_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

Received on 2020-02-23 10:22:38