Agreed that the wording should use that term. Updated in a new draft revision: https://rawgit.com/sg16-unicode/sg16/master/papers/d2029r3.html.On Tue, 14 Jul 2020, 22:56 Tom Honermann, <tom@honermann.net> wrote:
On 7/14/20 3:23 AM, Richard Smith wrote:
No problem, thank you for the feedback!On Mon, Jul 13, 2020 at 9:03 PM Tom Honermann via Core <core@lists.isocpp.org> wrote:
On 7/8/20 1:54 PM, Tom Honermann wrote:
On 7/8/20 6:43 AM, Alisdair Meredith wrote:
That seems reasonable to me, I'll do so.Minor nit: I dislike normatively stating that a null character is appended after string concatenation in two places. I do like the addition of this directly to the phase 6 wording, so suggest that the original in [lex.string]p12 with its extra flowery language be demoted to a note.
After looking at this again, I elected to go in a different direction.
[lex.phases] describes at a high level what is to be done in each phase and more-or-less defers to other sections for elaboration. From this lens, changing the normative text in [lex.string] into a note felt like the wrong direction. Instead, I chose to update the wording in [lex.string] to read a little nicer and to omit the flowery language. I then updated [lex.phases] to be less precise and to explicitly direct the reader to [lex.string] for details. I hope this acceptably satisfies the (very reasonable) concern about the previous normative duplication.
This paper has now been submitted for the upcoming mailing and can be found at https://isocpp.org/files/papers/P2029R2.html. The previous links to the draft will no longer work.
Apologies for not looking through this earlier.
Ah, yes, that is better. I updated the paper and this change will be included in the mailing. Preview at https://isocpp.org/files/papers/P2029R2.html.
"""conditional-escape-sequence-char:
any member of the basic source character set other than u, U, x, and the members of octal-digit and simple-escape-sequence-char
"""
I don't like talking about "members of" grammar productions. How about:
any member of the basic source character set that is not an octal-digit, a simple-escape-sequence-char, or u, U, or x
I meant the range of representable values. The general thinking is that numeric escape sequences are allowed to encode code unit values that are not valid according to the associated character encoding. Therefore, such sequences can encode 0xFF for UTF-8 literals, 0xFFFF for UTF-8 literals (if the underlying type of char8_t is greater than 8 bits), 0x1FFFF for UTF-16 literals (if the underlying type of char16_t is greater than 16 bits), etc... While we could place more restrictions here, doing so would curb freedoms without, in my opinion, offering significant assurances of well-formed code unit sequences.
5.13.3/Z.2.1:"""— If v does not exceed the range of the character-literal's type, then the value is v.
"""
What does "the range of the character-literal's type" mean? Do you mean the range of representable values? Or do you mean [0,0xFFFF] for char16_t and [0,0x10FFFF] for char32_t?
Thanks. For what it's worth, I think "range of representable values" is the right rule to use. (But the wording should use that term.)
5.13.3/Z.2.2:"""— Otherwise, if the character-literal's encoding-prefix is absent or L, then the value is implementation-defined.
"""
I appreciate that your wording reflects the behavior of the prior wording, but while we're here: do we really want '\ff' to have an implementation-defined value rather than being required to be (char)0xff (assuming 'char' is signed and 8-bit)? Now we guarantee 2s complement, perhaps we should just say you always get the result of converting the given value to char / wchar_t? (Similarly in 5.3.15/Z.2.)That seems reasonable, and I believe matches existing practice, but I'm not sure how to word it. Would we address cases like '\xFFFF' (with the same sign/size assumptions) explicitly? I don't think we can defer to the integral conversion rules since the source value doesn't have a specific type (the wording states "an integer value v". Perhaps we could steal the "type that is congruent to the source integer modulo 2N" wording?
"""
— Otherwise, if the character-literal's encoding-prefix is absent or L, then the value is the unique value of the character-literals type t that is congruent to v modulo 2N, where N is the width of t.
"""
Yes, that it something like it seems quite reasonable to me.
I looked into this and found that gcc 10.1, clang 10, Visual C++
19.24, and icc 19 all accept '\xff' and produce a value of -1 as
expected, but for '\x100', gcc and icc emit a warning, and Clang
and Visual C++ reject. https://www.godbolt.org/z/6qa1b7.
That leads me to believe this should be considered more of an
evolutionary change and addressed in a different paper.
Tom.
Tom.
Thanks!_______________________________________________Tom.
In the normative text, AFAICT, in C++20 wide multi character literals must be supported, with an implementation-defined value, but after this paper they will be conditionally supported. I don’t see that design change addressed in the front matter. Same applies to non-encodable wide characters .That is addressed in the "Proposed resolution overview" section. I can add a statement about this to the introduction if you like.
I've been under the impression that the lack of conditionally-supported for these is an oversight. My understanding (and someone please correct me if I'm mistaken; I don't recall where I was informed of this) is that, in the C standard, implementation-defined includes an allowance for rejecting the code as ill-formed, but in the C++ standard, implementation-defined implies well-formed; hence the addition of conditionally-supported. If that understanding is correct, then the updated wording corrects alignment with the intent of the C standard.
(I thought this also applied to ordinary multi character literals, but it turns out they are already conditionally supported.)Yup, in [lex.ccon]p1.
Tom.
AlisdairMOn Jul 7, 2020, at 16:33, Tom Honermann via Core <core@lists.isocpp.org> wrote: An update of D2029R2 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals) is now available at https://rawgit.com/sg16-unicode/sg16/master/papers/d2029r2.html. This addresses the feedback provided on the core mailing list in the thread starting at https://lists.isocpp.org/core/2020/06/9455.php. Wording review feedback prior to the next Core issues processing teleconference would be much appreciated! Tom. _______________________________________________ Core mailing list Core@lists.isocpp.org Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core Link to this post: http://lists.isocpp.org/core/2020/07/9545.php
Core mailing list
Core@lists.isocpp.org
Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
Link to this post: http://lists.isocpp.org/core/2020/07/9570.php