ISOCPP std-proposals List: [std-proposals] Reduce undefined behavior of signed integer literal arithmetic operations

From: 萧叶轩 <bizwen_at_[hidden]>
Date: Thu, 27 Apr 2023 19:35:00 +0000

Reduce undefined behavior of signed integer literal arithmetic operations

Abstract

Apply integral promotion on signed integer literal arithmetic operations to reduce undefined behavior.
Background
According to:

basic.fundamental/1<http://eel.is/c++draft/basic.fundamental#1> : The range of representable values for a signed integer type is −2<sup>N−1</sup> to 2<sup>N−1</sup> − 1.

basic.fundamental/2<http://eel.is/c++draft/basic.fundamental#2> : Overflow for signed arithmetic yields undefined behavior.

expr.pre/4<http://eel.is/c++draft/expr.pre#4> : If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
Considering the following code, each line has an undefined behavior:

  auto a = INT_MAX + 1;
  auto b = -INT_MIN;
  long long c = INT_MAX + 1;
  long long d = -INT_MIN;

GCC and Clang can diagnose that `INT_MAX + 1` and `-INT_MIN` have undefined behavior, while MSVC can only diagnose that `INT_MAX + 1` has.
Solution
Add a rule that when the operands of an operator are literals, apply integer promotion to increase the width of the type of the result to be large enough to store the value of the result value. If none of the standard signed integer types is large enough to store the value, allow the use of implementation-defined extended signed integer types. The program is ill-formed if no integer type is large enough to store the value (as with the rules for integer literals(lex.icon/4)<http://eel.is/c++draft/lex.icon#4>).

    auto a = INT_MAX + 1; // type of a is long or long long or a extended signed integer type
    auto b = 1;
    auto c = INT_MAX + b; // type of c is still int
    auto d = int{1} + INT_MAX; // still int, the compiler may give a warning
    auto e = LONG_MAX + 1 // maybe equivalent to 2147483647L + 1, type of e is long long or others

Furthermore, extend this rule to unsigned integer literals.

MSVC actually refuses to compile `-UINT_MAX`, but GCC and Clang allow it.

For unsigned overflow, choose a large enough unsigned integer type, for unsigned underflow, choose a large enough signed integer type, the program is ill-formed if there is no integer type that can store its value.
Compatibility
Even if the old code relies on undefined behavior, implementing this change will not change its result.

Users may see warnings of possible data loss when converting from large integers to small integers.

Using such expressions on templates will get a different type than before, but I guess no one really does this.

Since this operation produces a constant result at compile time, it does not affect optimization and is not affected by the platform. The reasons why signed overflow is still undefined behavior described in StackOverFlow<https://stackoverflow.com/questions/70801443/why-is-signed-overflow-due-to-computation-still-undefined-behavior-in-c20> are not applicable.

For unsigned integer types, it may cause `0 - ULLONG_MAX` to change from well-defined to ill-formed, or make its type the largest integer type that the implementation can represent, thus obtaining a well-defined definition.
Wording
Add new rules in expr.arith.conv<http://eel.is/c++draft/expr.arith.conv> or conv.prom<http://eel.is/c++draft/conv.prom> to match these cases.

Received on 2023-04-27 19:35:06