C++ Logo


Advanced search

Re: [SG16] [isocpp-core] Updated draft revision: D2029R2 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals)

From: Tom Honermann <tom_at_[hidden]>
Date: Wed, 15 Jul 2020 01:56:42 -0400
On 7/14/20 3:23 AM, Richard Smith wrote:
> On Mon, Jul 13, 2020 at 9:03 PM Tom Honermann via Core
> <core_at_[hidden] <mailto:core_at_[hidden]>> wrote:
> On 7/8/20 1:54 PM, Tom Honermann wrote:
>> On 7/8/20 6:43 AM, Alisdair Meredith wrote:
>>> Minor nit: I dislike normatively stating that a null character is
>>> appended after string concatenation in two places. I do like
>>> the addition of this directly to the phase 6 wording, so suggest
>>> that the original in [lex.string]p12 with its extra flowery language
>>> be demoted to a note.
>> That seems reasonable to me, I'll do so.
> After looking at this again, I elected to go in a different direction.
> [lex.phases] describes at a high level what is to be done in each
> phase and more-or-less defers to other sections for elaboration.
> From this lens, changing the normative text in [lex.string] into a
> note felt like the wrong direction. Instead, I chose to update
> the wording in [lex.string] to read a little nicer and to omit the
> flowery language. I then updated [lex.phases] to be less precise
> and to explicitly direct the reader to [lex.string] for details.
> I hope this acceptably satisfies the (very reasonable) concern
> about the previous normative duplication.
> This paper has now been submitted for the upcoming mailing and can
> be found at https://isocpp.org/files/papers/P2029R2.html. The
> previous links to the draft will no longer work.
> Apologies for not looking through this earlier.
No problem, thank you for the feedback!
> """
> conditional-escape-sequence-char:
> any member of the basic source character set other than u, U, x,
> and the members of octal-digit and simple-escape-sequence-char
> """
> I don't like talking about "members of" grammar productions. How about:
> any member of the basic source character set that is not an
> /octal-digit/, a /simple-escape-sequence-char/, or u, U, or x
Ah, yes, that is better. I updated the paper and this change will be
included in the mailing. Preview at
> 5.13.3/Z.2.1:
> """
> — If v does not exceed the range of the character-literal's type, then
> the value is v.
> """
> What does "the range of the character-literal's type" mean? Do you
> mean the range of representable values? Or do you mean [0,0xFFFF] for
> char16_t and [0,0x10FFFF] for char32_t?
I meant the range of representable values. The general thinking is that
numeric escape sequences are allowed to encode code unit values that are
not valid according to the associated character encoding. Therefore,
such sequences can encode 0xFF for UTF-8 literals, 0xFFFF for UTF-8
literals (if the underlying type of char8_t is greater than 8 bits),
0x1FFFF for UTF-16 literals (if the underlying type of char16_t is
greater than 16 bits), etc... While we could place more restrictions
here, doing so would curb freedoms without, in my opinion, offering
significant assurances of well-formed code unit sequences.
> 5.13.3/Z.2.2:
> """
> — Otherwise, if the character-literal's encoding-prefix is absent or
> L, then the value is implementation-defined.
> """
> I appreciate that your wording reflects the behavior of the prior
> wording, but while we're here: do we really want '\ff' to have an
> implementation-defined value rather than being required to be
> (char)0xff (assuming 'char' is signed and 8-bit)? Now we guarantee 2s
> complement, perhaps we should just say you always get the result of
> converting the given value to char / wchar_t? (Similarly in 5.3.15/Z.2.)

That seems reasonable, and I believe matches existing practice, but I'm
not sure how to word it. Would we address cases like '\xFFFF' (with the
same sign/size assumptions) explicitly? I don't think we can defer to
the integral conversion rules since the source value doesn't have a
specific type (the wording states "an integer value v". Perhaps we
could steal the "type that is congruent to the source integer modulo 2N"

— Otherwise, if the character-literal's encoding-prefix is absent or L,
then the value is the unique value of the /character-literal/s type t
that is congruent to v modulo 2N, where N is the width of t.


> Thanks!
> Tom.
>>> In the normative text, AFAICT, in C++20 wide multi character
>>> literals must be supported, with an implementation-defined value,
>>> but after this paper they will be conditionally supported. I don’t
>>> see that design change addressed in the front matter. Same
>>> applies to non-encodable wide characters .
>> That is addressed in the "Proposed resolution overview" section.
>> I can add a statement about this to the introduction if you like.
>> I've been under the impression that the lack of
>> conditionally-supported for these is an oversight. My
>> understanding (and someone please correct me if I'm mistaken; I
>> don't recall where I was informed of this) is that, in the C
>> standard, implementation-defined includes an allowance for
>> rejecting the code as ill-formed, but in the C++ standard,
>> implementation-defined implies well-formed; hence the addition of
>> conditionally-supported. If that understanding is correct, then
>> the updated wording corrects alignment with the intent of the C
>> standard.
>>> (I thought this also applied to ordinary multi character literals,
>>> but it turns out they are already conditionally supported.)
>> Yup, in [lex.ccon]p1 <http://eel.is/c++draft/lex.ccon#1.sentence-4>.
>> Tom.
>>> AlisdairM
>>>> On Jul 7, 2020, at 16:33, Tom Honermann via Core<core_at_[hidden]> <mailto:core_at_[hidden]> wrote:
>>>> An update of D2029R2 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals) is now available athttps://rawgit.com/sg16-unicode/sg16/master/papers/d2029r2.html. This addresses the feedback provided on the core mailing list in the thread starting athttps://lists.isocpp.org/core/2020/06/9455.php.
>>>> Wording review feedback prior to the next Core issues processing teleconference would be much appreciated!
>>>> Tom.
>>>> _______________________________________________
>>>> Core mailing list
>>>> Core_at_[hidden] <mailto:Core_at_[hidden]>
>>>> Subscription:https://lists.isocpp.org/mailman/listinfo.cgi/core
>>>> Link to this post:http://lists.isocpp.org/core/2020/07/9545.php
> _______________________________________________
> Core mailing list
> Core_at_[hidden] <mailto:Core_at_[hidden]>
> Subscription: https://lists.isocpp.org/mailman/listinfo.cgi/core
> Link to this post: http://lists.isocpp.org/core/2020/07/9570.php

Received on 2020-07-15 01:00:10