On 18/07/2020 08.48, Tom Honermann via SG16 wrote:On 7/15/20 3:21 AM, Richard Smith wrote:On Tue, 14 Jul 2020, 22:56 Tom Honermann, <tom@honermann.net <mailto:tom@honermann.net>> wrote: On 7/14/20 3:23 AM, Richard Smith wrote:5.13.3/Z.2.2: """ — Otherwise, if the character-literal's encoding-prefix is absent or L, then the value is implementation-defined. """ I appreciate that your wording reflects the behavior of the prior wording, but while we're here: do we really want '\ff' to have an implementation-defined value rather than being required to be (char)0xff (assuming 'char' is signed and 8-bit)? Now we guarantee 2s complement, perhaps we should just say you always get the result of converting the given value to char / wchar_t? (Similarly in 5.3.15/Z.2.)That seems reasonable, and I believe matches existing practice, but I'm not sure how to word it. Would we address cases like '\xFFFF' (with the same sign/size assumptions) explicitly? I don't think we can defer to the integral conversion rules since the source value doesn't have a specific type (the wording states "an integer value v". Perhaps we could steal the "type that is congruent to the source integer modulo 2N" wording? """ — Otherwise, if the character-literal's encoding-prefix is absent or L, then the value is the unique value of the /character-literal/s type t that is congruent to v modulo 2N, where N is the width of t. """ Yes, that it something like it seems quite reasonable to me.I looked into this and found that gcc 10.1, clang 10, Visual C++ 19.24, and icc 19 all accept '\xff' and produce a value of -1 as expected, but for '\x100', gcc and icc emit a warning, and Clang and Visual C++ reject. https://www.godbolt.org/z/6qa1b7. That leads me to believe this should be considered more of an evolutionary change and addressed in a different paper.A hex number is conceptually unsigned. We could say we take the character-literal's type (or its underlying type, if any), take the unsigned type corresponding to that (if it's not already unsigned), and you only get the "modulo 2^N" behavior if the hex value is in the range of representable values for that unsigned type.
Ok, that seems pretty workable. I updated the D2029R3 draft and posted it to the wiki for the core issues processing telecon on Monday. A blurb has been added to the introduction and PR overview. The wording update states:
> [lex.ccon]pZ.3: Otherwise, if the character-literal's encoding-prefix is absent or L, and V does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the character-literal's type, then the value is the unique value of the character-literal's type T that is congruent to V modulo 2N, where N is the width of T.
> [lex.string]pZ.2: Otherwise, if the string-literal's
encoding-prefix is absent or L, and V does not exceed the range of
representable values of the corresponding unsigned type for the
underlying type of the string-literal's code unit type, then the
value is the unique value of the string-literal's code unit type T
that is congruent to V modulo 2N, where N is the width
of T.
Tom.
Jens