C++ Logo

sg16

Advanced search

Re: [SG16] [isocpp-core] Updated draft revision: D2029R2 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals)

From: Richard Smith <richardsmith_at_[hidden]>
Date: Wed, 15 Jul 2020 00:21:44 -0700
On Tue, 14 Jul 2020, 22:56 Tom Honermann, <tom_at_[hidden]> wrote:

> On 7/14/20 3:23 AM, Richard Smith wrote:
>
> On Mon, Jul 13, 2020 at 9:03 PM Tom Honermann via Core <
> core_at_[hidden]> wrote:
>
>> On 7/8/20 1:54 PM, Tom Honermann wrote:
>>
>> On 7/8/20 6:43 AM, Alisdair Meredith wrote:
>>
>> Minor nit: I dislike normatively stating that a null character is
>> appended after string concatenation in two places. I do like
>> the addition of this directly to the phase 6 wording, so suggest
>> that the original in [lex.string]p12 with its extra flowery language
>> be demoted to a note.
>>
>> That seems reasonable to me, I'll do so.
>>
>> After looking at this again, I elected to go in a different direction.
>>
>> [lex.phases] describes at a high level what is to be done in each phase
>> and more-or-less defers to other sections for elaboration. From this lens,
>> changing the normative text in [lex.string] into a note felt like the wrong
>> direction. Instead, I chose to update the wording in [lex.string] to read
>> a little nicer and to omit the flowery language. I then updated
>> [lex.phases] to be less precise and to explicitly direct the reader to
>> [lex.string] for details. I hope this acceptably satisfies the (very
>> reasonable) concern about the previous normative duplication.
>>
>> This paper has now been submitted for the upcoming mailing and can be
>> found at https://isocpp.org/files/papers/P2029R2.html. The previous
>> links to the draft will no longer work.
>>
> Apologies for not looking through this earlier.
>
> No problem, thank you for the feedback!
>
>
> """
> conditional-escape-sequence-char:
> any member of the basic source character set other than u, U, x, and
> the members of octal-digit and simple-escape-sequence-char
> """
>
> I don't like talking about "members of" grammar productions. How about:
>
> any member of the basic source character set that is not an
> *octal-digit*, a *simple-escape-sequence-char*, or u, U, or x
>
> Ah, yes, that is better. I updated the paper and this change will be
> included in the mailing. Preview at
> https://isocpp.org/files/papers/P2029R2.html.
>
>
> 5.13.3/Z.2.1:
> """
> — If v does not exceed the range of the character-literal's type, then the
> value is v.
> """
>
> What does "the range of the character-literal's type" mean? Do you mean
> the range of representable values? Or do you mean [0,0xFFFF] for char16_t
> and [0,0x10FFFF] for char32_t?
>
> I meant the range of representable values. The general thinking is that
> numeric escape sequences are allowed to encode code unit values that are
> not valid according to the associated character encoding. Therefore, such
> sequences can encode 0xFF for UTF-8 literals, 0xFFFF for UTF-8 literals (if
> the underlying type of char8_t is greater than 8 bits), 0x1FFFF for UTF-16
> literals (if the underlying type of char16_t is greater than 16 bits),
> etc... While we could place more restrictions here, doing so would curb
> freedoms without, in my opinion, offering significant assurances of
> well-formed code unit sequences.
>

Thanks. For what it's worth, I think "range of representable values" is the
right rule to use. (But the wording should use that term.)

> 5.13.3/Z.2.2:
> """
> — Otherwise, if the character-literal's encoding-prefix is absent or L,
> then the value is implementation-defined.
> """
>
> I appreciate that your wording reflects the behavior of the prior wording,
> but while we're here: do we really want '\ff' to have an
> implementation-defined value rather than being required to be (char)0xff
> (assuming 'char' is signed and 8-bit)? Now we guarantee 2s complement,
> perhaps we should just say you always get the result of converting the
> given value to char / wchar_t? (Similarly in 5.3.15/Z.2.)
>
> That seems reasonable, and I believe matches existing practice, but I'm
> not sure how to word it. Would we address cases like '\xFFFF' (with the
> same sign/size assumptions) explicitly? I don't think we can defer to the
> integral conversion rules since the source value doesn't have a specific
> type (the wording states "an integer value v". Perhaps we could steal the
> "type that is congruent to the source integer modulo 2N" wording?
>
> """
> — Otherwise, if the character-literal's encoding-prefix is absent or L,
> then the value is the unique value of the *character-literal*s type t
> that is congruent to v modulo 2N, where N is the width of t.
> """
>
Yes, that it something like it seems quite reasonable to me.

> Tom.
>
>
> Thanks!
>
>> Tom.
>>
>> In the normative text, AFAICT, in C++20 wide multi character
>> literals must be supported, with an implementation-defined value,
>> but after this paper they will be conditionally supported. I don’t
>> see that design change addressed in the front matter. Same
>> applies to non-encodable wide characters .
>>
>> That is addressed in the "Proposed resolution overview" section. I can
>> add a statement about this to the introduction if you like.
>>
>> I've been under the impression that the lack of conditionally-supported
>> for these is an oversight. My understanding (and someone please correct me
>> if I'm mistaken; I don't recall where I was informed of this) is that, in
>> the C standard, implementation-defined includes an allowance for rejecting
>> the code as ill-formed, but in the C++ standard, implementation-defined
>> implies well-formed; hence the addition of conditionally-supported. If
>> that understanding is correct, then the updated wording corrects alignment
>> with the intent of the C standard.
>>
>> (I thought this also applied to ordinary multi character literals,
>> but it turns out they are already conditionally supported.)
>>
>> Yup, in [lex.ccon]p1 <http://eel.is/c++draft/lex.ccon#1.sentence-4>.
>>
>> Tom.
>>
>> AlisdairM
>>
>>
>> On Jul 7, 2020, at 16:33, Tom Honermann via Core <core_at_[hidden]> <core_at_[hidden]> wrote:
>>
>> An update of D2029R2 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals) is now available at https://rawgit.com/sg16-unicode/sg16/master/papers/d2029r2.html. This addresses the feedback provided on the core mailing list in the thread starting at https://lists.isocpp.org/core/2020/06/9455.php.
>>
>> Wording review feedback prior to the next Core issues processing teleconference would be much appreciated!
>>
>> Tom.
>>
>> _______________________________________________
>> Core mailing listCore_at_[hidden]
>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
>> Link to this post: http://lists.isocpp.org/core/2020/07/9545.php
>>
>>
>>
>> _______________________________________________
>> Core mailing list
>> Core_at_[hidden]
>> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
>> Link to this post: http://lists.isocpp.org/core/2020/07/9570.php
>>
>
>

Received on 2020-07-15 02:25:17