I don't recall discussing this in EWG, but I believe it was discussed at some point and the conclusion was that an identifier not in NFC form resulting from such concatenation was ill-formed. Can you elaborate as to how the UB manifests?On Sat, Feb 22, 2020 at 3:37 PM Jens Maurer via SG16 <sg16@lists.isocpp.org> wrote:
On 22/02/2020 19.46, Steve Downey via SG16 wrote:
> Attached are the revised P1949 paper pulling back the changes from D1949R2 as presented and the original source. I've also fixed up formatting problems. The markdown is the original source and is transformed using Michael Parks's wg21 paper system. https://github.com/mpark/wg21 and https://mpark.github.io/programming/2018/11/16/how-i-format-my-cpp-papers/ .
>
> Source for p1949 is https://github.com/steve-downey/papers/blob/master/p1949.md
>
> As this is intended to capture the paper as presented, please just review that it is accurate in that regard. However, I'm absolutely open to any edits or suggestions for R3. In particular, for R3 (or 4) in Varna I'll want to make sure the body of the paper reflects and supports the wording, because I hate when they don't.
Brief comments on the proposed wording for R3 or so:
- Add periods at the end of the sentences in the new Annex.
- X.7 needs an empty line
- Add cross-references from the Annex to the main text where
the normative statement appears.
- The underline seems to be forgotten in "start" in lex.name
- diff.cpp20.lex "affected subclause" 5.10: also add section label
- When referring to ISO 10646, it would probably be helpful to
also give a section number for the rule being relied upon.
- Steve, please add subclause and paragraph numbers (alongside the stable name for the subclause) based on an identified IS working draft.- The references to ISO 10646 in the first changed paragraph appear to be wrong. There is no XID_CONTINUE class in ISO/IEC 10646:2017. The reference should be to UAX #44 (please add to the normative references). Also, the property name is "XID_Continue".
- Suggestion:An identifier shall<del> conform to the</del><ins>be in</ins> NFC<del> normalization</del><ins> as</ins> specified in ISO/IEC 10646.
<del>If an identifier is encountered that does not, the program is ill-formed.</del>- I suggest having a note in the paper that the NFC restriction only comes into effect for user-defined-integer-literals and user-defined-floating-point-literals at phase 7 of translation.
- Was EWG informed that the proposal can be understood as introducing new core language undefined behaviour for existing programs with identifiers in NFC form where the concatenation is not in NFC form ([cpp.concat])? I note that R1 of the paper was not clear on that point and R2 does not identify it as a consideration.
- Is there an intent to bring the paper to WG 14?
We have a github issue to track doing so:
- https://github.com/sg16-unicode/sg16/issues/56
Tom.
- In the Annex D entry, the ISO 10646 references should be to UAX #31. UAX #31 should be added to the bibliography.- In the Annex D entry, do not claim that identifiers can only be written a single way. That statement is not true before phase 5 of translation.
- The new Annex should be marked as being informative only.
- In the new Annex, UAX #31 should be referenced as "UAX #31".- The C++ Standard uses "white-space" as an adjective modifying "characters".
- Missing "s" in the introduction to the list of white-space characters.- The "minus space" should be "minus the white-space characters".- The code font tables should be formatted as real tables with headings and no extraneous fields.
Jens
--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16