sg12: Re: [ub] A proposal to define signed overflow submitted?

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Mon, 12 Mar 2018 16:05:15 -0700

On Mon, Mar 12, 2018 at 2:36 PM, Lawrence Crowl <Lawrence_at_[hidden]> wrote:

> On 3/12/18, Myria <myriachan_at_[hidden]> wrote:
> > On Mon, Mar 12, 2018 at 13:32 Lawrence Crowl <Lawrence_at_[hidden]> wrote:
> >> On 3/12/18, Myria <myriachan_at_[hidden]> wrote:
> >>> The severity of the current situation is that I generally avoid signed
> >>> integers if I intend to do any arithmetic on them whatsoever, lest the
> >>> compiler decide to make demons come out of my nose.
> >>> And even then, I'm not safe:
> >>>
> >>> std::uint16_t x = 0xFFFF;
> >>> x *= x; // undefined behavior on most modern platforms
> >>
> >> How? The C++ standard defines unsigned arithmetic as
> >> modular arithmetic.
> >
> > But that's the catch: it's double secret signed arithmetic. [...]
> > On a "typical modern platform", std::uint16_t is unsigned short. [...]
> > 65535 * 65535 overflows a signed int on a typical 32-bit int platform,
> > which is undefined behavior.
>
> Good example.
>

Yes.
I have now added `uint16_t(65535) * uint16_t(65535)` as a row in the second
table in https://quuxplusone.github.io/draft/twosc-conservative.html. Highly
unfortunately, my "conservative" two's-complement idea would not fix it,
because multiplication is an arithmetic operation (not a bitwise operation)
and by the time the operation is happening, the standard integral
promotions have already kicked in, so the multiplication is happening on
signed quantities.
With JF Bastien's two's-complement proposal P0907R0, the multiplication
would take place in signed int as if -fwrapv were in effect, producing a
well-defined answer of `int(-131071)`. This is still the "wrong type," but
converting it back down to uint16_t is guaranteed to have the expected
effect even in present-day C++.

>> More importantly, what happens to your program when x*x < x?
> >
> > The code that led me to finding this was a 16-bit variant of the FNV
> > hash function, so it worked properly after the correct casts were added
> > to allow the wrap.
>
> So the application intended modular arithmetic? I was concerned about
> the normal case where 'unsigned' is used to constrain the value range,
> not to get modular arithmetic.
>

IMNSHO, if anyone is using unsigned types "to constrain the value range,"
they are doing computers wrong. That is *not* what signed vs. unsigned
types are for.

As Lawrence himself wrote earlier in this thread:
> If integer overflow is undefined behavior, then it is wrong. Tools can detect
wrong programs and report them.
The contrapositive is: "If the programmer is using a type where integer
overflow is well-defined to wrap, then we can assume that the program
relies on that wrapping behavior (because there would otherwise be a strong
incentive for the programmer to use a type that detects and reports
unintended overflow)."

The original design for the STL contained the "unsigned for value range"
antipattern. Consequently, they ran into trouble immediately: for example, `
std::string::find
<https://en.cppreference.com/w/cpp/string/basic_string/find>` returns an
index into the string, naturally of type `std::string::size_type`. But
size_type is unsigned! So instead of returning "negative 1" to indicate
the "not found" case, they had to make it return `size_type(-1)`, a.k.a.
`std::string::npos` — which is a positive value! This means that callers
have to write cumbersome things such as

if (s.find('k') != std::string::npos)

where it would be more natural to write

if (s.find('k') >= 0)

This is sort of parallel to my quotation of Lawrence above: If every
possible value in the domain of a given type is a valid output (e.g. from
`find`), then there is no value left over with which the function can
signal failure at runtime. And if every possible value in the domain is a
valid *input* (e.g. to `malloc`), then there is no way for the function to
detect incorrect input at runtime.

If it weren't for the STL's `size_type` snafu continually muddying the
waters for new learners, I doubt people would be falling into the "unsigned
for value range" antipattern anymore.

–Arthur

Received on 2018-03-13 00:05:18