sg16: Re: [SG16] [isocpp-core] Updated draft revision: D2029R2 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals)

From: Tom Honermann <tom_at_[hidden]>
Date: Sun, 19 Jul 2020 23:52:42 -0400

On 7/18/20 2:57 AM, Jens Maurer wrote:
> On 18/07/2020 08.48, Tom Honermann via SG16 wrote:
>> On 7/15/20 3:21 AM, Richard Smith wrote:
>>> On Tue, 14 Jul 2020, 22:56 Tom Honermann, <tom_at_[hidden] <mailto:tom_at_[hidden]>> wrote:
>>>
>>> On 7/14/20 3:23 AM, Richard Smith wrote:
>>>> 5.13.3/Z.2.2:
>>>> """
>>>> — Otherwise, if the character-literal's encoding-prefix is absent or L, then the value is implementation-defined.
>>>> """
>>>>
>>>> I appreciate that your wording reflects the behavior of the prior wording, but while we're here: do we really want '\ff' to have an implementation-defined value rather than being required to be (char)0xff (assuming 'char' is signed and 8-bit)? Now we guarantee 2s complement, perhaps we should just say you always get the result of converting the given value to char / wchar_t? (Similarly in 5.3.15/Z.2.)
>>> That seems reasonable, and I believe matches existing practice, but I'm not sure how to word it. Would we address cases like '\xFFFF' (with the same sign/size assumptions) explicitly? I don't think we can defer to the integral conversion rules since the source value doesn't have a specific type (the wording states "an integer value v". Perhaps we could steal the "type that is congruent to the source integer modulo 2N" wording?
>>>
>>> """
>>> — Otherwise, if the character-literal's encoding-prefix is absent or L, then the value is the unique value of the /character-literal/s type t that is congruent to v modulo 2N, where N is the width of t.
>>> """
>>>
>>> Yes, that it something like it seems quite reasonable to me.
>> I looked into this and found that gcc 10.1, clang 10, Visual C++ 19.24, and icc 19 all accept '\xff' and produce a value of -1 as expected, but for '\x100', gcc and icc emit a warning, and Clang and Visual C++ reject. https://www.godbolt.org/z/6qa1b7. That leads me to believe this should be considered more of an evolutionary change and addressed in a different paper.
> A hex number is conceptually unsigned. We could say we take the character-literal's
> type (or its underlying type, if any), take the unsigned type corresponding to that
> (if it's not already unsigned), and you only get the "modulo 2^N" behavior if
> the hex value is in the range of representable values for that unsigned type.

Ok, that seems pretty workable. I updated the D2029R3 draft
<https://rawgit.com/sg16-unicode/sg16/master/papers/d2029r3.html> and
posted it to the wiki
<https://wiki.edg.com/bin/view/Wg21summer2020/CoreWorkingGroup> for the
core issues processing telecon on Monday. A blurb has been added to the
introduction and PR overview. The wording update states:

> [lex.ccon]pZ.3: Otherwise, if the character-literal's encoding-prefix
is absent or L, and V does not exceed the range of representable values
of the corresponding unsigned type for the underlying type of the
character-literal's type, then the value is the unique value of the
character-literal's type T that is congruent to V modulo 2^N , where N
is the width of T.

> [lex.string]pZ.2: Otherwise, if the string-literal's encoding-prefix
is absent or L, and V does not exceed the range of representable values
of the corresponding unsigned type for the underlying type of the
string-literal's code unit type, then the value is the unique value of
the string-literal's code unit type T that is congruent to V modulo 2^N
, where N is the width of T.

Tom.

>
> Jens

Received on 2020-07-19 22:56:03