C++ Logo

sg16

Advanced search

Re: [SG16] [isocpp-core] Updated draft revision: D2029R2 (Proposed resolution for core issues 411, 1656, and 2333; numeric and universal character escapes in character and string literals)

From: Jens Maurer <Jens.Maurer_at_[hidden]>
Date: Sat, 18 Jul 2020 08:57:37 +0200
On 18/07/2020 08.48, Tom Honermann via SG16 wrote:
> On 7/15/20 3:21 AM, Richard Smith wrote:
>> On Tue, 14 Jul 2020, 22:56 Tom Honermann, <tom_at_[hidden] <mailto:tom_at_[hidden]>> wrote:
>>
>> On 7/14/20 3:23 AM, Richard Smith wrote:

>>> 5.13.3/Z.2.2:
>>> """
>>> — Otherwise, if the character-literal's encoding-prefix is absent or L, then the value is implementation-defined.
>>> """
>>>
>>> I appreciate that your wording reflects the behavior of the prior wording, but while we're here: do we really want '\ff' to have an implementation-defined value rather than being required to be (char)0xff (assuming 'char' is signed and 8-bit)? Now we guarantee 2s complement, perhaps we should just say you always get the result of converting the given value to char / wchar_t? (Similarly in 5.3.15/Z.2.)
>>
>> That seems reasonable, and I believe matches existing practice, but I'm not sure how to word it. Would we address cases like '\xFFFF' (with the same sign/size assumptions) explicitly? I don't think we can defer to the integral conversion rules since the source value doesn't have a specific type (the wording states "an integer value v". Perhaps we could steal the "type that is congruent to the source integer modulo 2N" wording?
>>
>> """
>> — Otherwise, if the character-literal's encoding-prefix is absent or L, then the value is the unique value of the /character-literal/s type t that is congruent to v modulo 2N, where N is the width of t.
>> """
>>
>> Yes, that it something like it seems quite reasonable to me.
>
> I looked into this and found that gcc 10.1, clang 10, Visual C++ 19.24, and icc 19 all accept '\xff' and produce a value of -1 as expected, but for '\x100', gcc and icc emit a warning, and Clang and Visual C++ reject. https://www.godbolt.org/z/6qa1b7. That leads me to believe this should be considered more of an evolutionary change and addressed in a different paper.

A hex number is conceptually unsigned. We could say we take the character-literal's
type (or its underlying type, if any), take the unsigned type corresponding to that
(if it's not already unsigned), and you only get the "modulo 2^N" behavior if
the hex value is in the range of representable values for that unsigned type.

Jens

Received on 2020-07-18 02:01:00