On Mon, Mar 12, 2018 at 2:36 PM, Lawrence Crowl <Lawrence@crowl.org> wrote:

On 3/12/18, Myria <myriachan@gmail.com> wrote:
> On Mon, Mar 12, 2018 at 13:32 Lawrence Crowl <Lawrence@crowl.org> wrote:
>> On 3/12/18, Myria <myriachan@gmail.com> wrote:
>>> The severity of the current situation is that I generally avoid signed
>>> integers if I intend to do any arithmetic on them whatsoever, lest the
>>> compiler decide to make demons come out of my nose.
>>> And even then, I'm not safe:
>>>
>>> std::uint16_t x = 0xFFFF;
>>> x *= x; // undefined behavior on most modern platforms
>>
>> How? The C++ standard defines unsigned arithmetic as
>> modular arithmetic.
>
> But that's the catch: it's double secret signed arithmetic. [...]
> On a "typical modern platform", std::uint16_t is unsigned short. [...]
> 65535 * 65535 overflows a signed int on a typical 32-bit int platform,
> which is undefined behavior.

Good example.

Yes.

I have now added `uint16_t(65535) * uint16_t(65535)` as a row in the second table in https://quuxplusone.github.io/draft/twosc-conservative.html. Highly unfortunately, my "conservative" two's-complement idea would not fix it, because multiplication is an arithmetic operation (not a bitwise operation) and by the time the operation is happening, the standard integral promotions have already kicked in, so the multiplication is happening on signed quantities.

With JF Bastien's two's-complement proposal P0907R0, the multiplication would take place in signed int as if -fwrapv were in effect, producing a well-defined answer of `int(-131071)`. This is still the "wrong type," but converting it back down to uint16_t is guaranteed to have the expected effect even in present-day C++.

>> More importantly, what happens to your program when x*x < x?
>
> The code that led me to finding this was a 16-bit variant of the FNV
> hash function, so it worked properly after the correct casts were added
> to allow the wrap.

So the application intended modular arithmetic? I was concerned about
the normal case where 'unsigned' is used to constrain the value range,
not to get modular arithmetic.

IMNSHO, if anyone is using unsigned types "to constrain the value range," they are doing computers wrong. That is not what signed vs. unsigned types are for.

As Lawrence himself wrote earlier in this thread:

> If integer overflow is undefined behavior, then it is wrong. Tools can detect wrong programs and report them.

The contrapositive is: "If the programmer is using a type where integer overflow is well-defined to wrap, then we can assume that the program relies on that wrapping behavior (because there would otherwise be a strong incentive for the programmer to use a type that detects and reports unintended overflow)."

The original design for the STL contained the "unsigned for value range" antipattern. Consequently, they ran into trouble immediately: for example, `std::string::find` returns an index into the string, naturally of type `std::string::size_type`. But size_type is unsigned! So instead of returning "negative 1" to indicate the "not found" case, they had to make it return `size_type(-1)`, a.k.a. `std::string::npos` — which is a positive value! This means that callers have to write cumbersome things such as

if (s.find('k') != std::string::npos)

where it would be more natural to write

if (s.find('k') >= 0)

This is sort of parallel to my quotation of Lawrence above: If every possible value in the domain of a given type is a valid output (e.g. from `find`), then there is no value left over with which the function can signal failure at runtime. And if every possible value in the domain is a valid input (e.g. to `malloc`), then there is no way for the function to detect incorrect input at runtime.

If it weren't for the STL's `size_type` snafu continually muddying the waters for new learners, I doubt people would be falling into the "unsigned for value range" antipattern anymore.

–Arthur