Date: Mon, 12 Mar 2018 16:05:15 -0700

On Mon, Mar 12, 2018 at 2:36 PM, Lawrence Crowl <Lawrence_at_[hidden]> wrote:

> On 3/12/18, Myria <myriachan_at_[hidden]> wrote:

> > On Mon, Mar 12, 2018 at 13:32 Lawrence Crowl <Lawrence_at_[hidden]> wrote:

> >> On 3/12/18, Myria <myriachan_at_[hidden]> wrote:

> >>> The severity of the current situation is that I generally avoid signed

> >>> integers if I intend to do any arithmetic on them whatsoever, lest the

> >>> compiler decide to make demons come out of my nose.

> >>> And even then, I'm not safe:

> >>>

> >>> std::uint16_t x = 0xFFFF;

> >>> x *= x; // undefined behavior on most modern platforms

> >>

> >> How? The C++ standard defines unsigned arithmetic as

> >> modular arithmetic.

> >

> > But that's the catch: it's double secret signed arithmetic. [...]

> > On a "typical modern platform", std::uint16_t is unsigned short. [...]

> > 65535 * 65535 overflows a signed int on a typical 32-bit int platform,

> > which is undefined behavior.

>

> Good example.

>

Yes.

I have now added `uint16_t(65535) * uint16_t(65535)` as a row in the second

table in https://quuxplusone.github.io/draft/twosc-conservative.html. Highly

unfortunately, my "conservative" two's-complement idea would not fix it,

because multiplication is an arithmetic operation (not a bitwise operation)

and by the time the operation is happening, the standard integral

promotions have already kicked in, so the multiplication is happening on

signed quantities.

With JF Bastien's two's-complement proposal P0907R0, the multiplication

would take place in signed int as if -fwrapv were in effect, producing a

well-defined answer of `int(-131071)`. This is still the "wrong type," but

converting it back down to uint16_t is guaranteed to have the expected

effect even in present-day C++.

>> More importantly, what happens to your program when x*x < x?

> >

> > The code that led me to finding this was a 16-bit variant of the FNV

> > hash function, so it worked properly after the correct casts were added

> > to allow the wrap.

>

> So the application intended modular arithmetic? I was concerned about

> the normal case where 'unsigned' is used to constrain the value range,

> not to get modular arithmetic.

>

IMNSHO, if anyone is using unsigned types "to constrain the value range,"

they are doing computers wrong. That is *not* what signed vs. unsigned

types are for.

As Lawrence himself wrote earlier in this thread:

> If integer overflow is undefined behavior, then it is wrong. Tools can detect

wrong programs and report them.

The contrapositive is: "If the programmer is using a type where integer

overflow is well-defined to wrap, then we can assume that the program

relies on that wrapping behavior (because there would otherwise be a strong

incentive for the programmer to use a type that detects and reports

unintended overflow)."

The original design for the STL contained the "unsigned for value range"

antipattern. Consequently, they ran into trouble immediately: for example, `

std::string::find

<https://en.cppreference.com/w/cpp/string/basic_string/find>` returns an

index into the string, naturally of type `std::string::size_type`. But

size_type is unsigned! So instead of returning "negative 1" to indicate

the "not found" case, they had to make it return `size_type(-1)`, a.k.a.

`std::string::npos` — which is a positive value! This means that callers

have to write cumbersome things such as

if (s.find('k') != std::string::npos)

where it would be more natural to write

if (s.find('k') >= 0)

This is sort of parallel to my quotation of Lawrence above: If every

possible value in the domain of a given type is a valid output (e.g. from

`find`), then there is no value left over with which the function can

signal failure at runtime. And if every possible value in the domain is a

valid *input* (e.g. to `malloc`), then there is no way for the function to

detect incorrect input at runtime.

If it weren't for the STL's `size_type` snafu continually muddying the

waters for new learners, I doubt people would be falling into the "unsigned

for value range" antipattern anymore.

–Arthur

> On 3/12/18, Myria <myriachan_at_[hidden]> wrote:

> > On Mon, Mar 12, 2018 at 13:32 Lawrence Crowl <Lawrence_at_[hidden]> wrote:

> >> On 3/12/18, Myria <myriachan_at_[hidden]> wrote:

> >>> The severity of the current situation is that I generally avoid signed

> >>> integers if I intend to do any arithmetic on them whatsoever, lest the

> >>> compiler decide to make demons come out of my nose.

> >>> And even then, I'm not safe:

> >>>

> >>> std::uint16_t x = 0xFFFF;

> >>> x *= x; // undefined behavior on most modern platforms

> >>

> >> How? The C++ standard defines unsigned arithmetic as

> >> modular arithmetic.

> >

> > But that's the catch: it's double secret signed arithmetic. [...]

> > On a "typical modern platform", std::uint16_t is unsigned short. [...]

> > 65535 * 65535 overflows a signed int on a typical 32-bit int platform,

> > which is undefined behavior.

>

> Good example.

>

Yes.

I have now added `uint16_t(65535) * uint16_t(65535)` as a row in the second

table in https://quuxplusone.github.io/draft/twosc-conservative.html. Highly

unfortunately, my "conservative" two's-complement idea would not fix it,

because multiplication is an arithmetic operation (not a bitwise operation)

and by the time the operation is happening, the standard integral

promotions have already kicked in, so the multiplication is happening on

signed quantities.

With JF Bastien's two's-complement proposal P0907R0, the multiplication

would take place in signed int as if -fwrapv were in effect, producing a

well-defined answer of `int(-131071)`. This is still the "wrong type," but

converting it back down to uint16_t is guaranteed to have the expected

effect even in present-day C++.

>> More importantly, what happens to your program when x*x < x?

> >

> > The code that led me to finding this was a 16-bit variant of the FNV

> > hash function, so it worked properly after the correct casts were added

> > to allow the wrap.

>

> So the application intended modular arithmetic? I was concerned about

> the normal case where 'unsigned' is used to constrain the value range,

> not to get modular arithmetic.

>

IMNSHO, if anyone is using unsigned types "to constrain the value range,"

they are doing computers wrong. That is *not* what signed vs. unsigned

types are for.

As Lawrence himself wrote earlier in this thread:

> If integer overflow is undefined behavior, then it is wrong. Tools can detect

wrong programs and report them.

The contrapositive is: "If the programmer is using a type where integer

overflow is well-defined to wrap, then we can assume that the program

relies on that wrapping behavior (because there would otherwise be a strong

incentive for the programmer to use a type that detects and reports

unintended overflow)."

The original design for the STL contained the "unsigned for value range"

antipattern. Consequently, they ran into trouble immediately: for example, `

std::string::find

<https://en.cppreference.com/w/cpp/string/basic_string/find>` returns an

index into the string, naturally of type `std::string::size_type`. But

size_type is unsigned! So instead of returning "negative 1" to indicate

the "not found" case, they had to make it return `size_type(-1)`, a.k.a.

`std::string::npos` — which is a positive value! This means that callers

have to write cumbersome things such as

if (s.find('k') != std::string::npos)

where it would be more natural to write

if (s.find('k') >= 0)

This is sort of parallel to my quotation of Lawrence above: If every

possible value in the domain of a given type is a valid output (e.g. from

`find`), then there is no value left over with which the function can

signal failure at runtime. And if every possible value in the domain is a

valid *input* (e.g. to `malloc`), then there is no way for the function to

detect incorrect input at runtime.

If it weren't for the STL's `size_type` snafu continually muddying the

waters for new learners, I doubt people would be falling into the "unsigned

for value range" antipattern anymore.

–Arthur

Received on 2018-03-13 00:05:18