C++ Logo

sg12

Advanced search

Re: [ub] ub due to left operand of shift

From: Chandler Carruth <chandlerc_at_[hidden]>
Date: Thu, 24 Oct 2013 10:44:15 -0700
On Thu, Oct 24, 2013 at 10:09 AM, John Regehr <regehr_at_[hidden]> wrote:

> The C99 standard, and also the latest working drafts of C11 and C++11
> that I know of, effectively forbid shifting a 1 bit into, out of, or
> through the sign bit. But also we have this, which proposes
> strengthening the behavior the allow shifting into the sign bit but not
> out of it or past it:
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3367.html#1457
>

Note that I believe C++14 will incorporate this fix.

I agree with this fix because it seems totally unsurprising, etc. However,
I don't agree with a lot of your points below....

Here are a few observations about this class of undefined behavior:
>
> - As one of the people who helped get good integer undefined behavior
> checking into Clang 3.3, I ran a large amount of open source software
> with these checks turned on. Basically every large open source program
> is undefined due to the LHS rules for signed left shift.
>

Did you distinguish between those due to shift-into-sign-bit, and those
which actually shifted completely off the top?


> - These undefined behaviors are extremely surprising to developers.
> Moreover, developers do not care about them. In fact, we stopped
> reporting them as bugs because this was hurting our credibility as
> providers of useful, previously-unknown information about potential
> application bugs.
>

Except for the sign-bit, every case of this we have found has been either a
bug in the user code, or has been an encoding algorithm that made rampant
assumptions about the underlying machine architecture.

The first case greatly appreciated the results. The second case are made
more portable and explicit by using unsigned integers. The authors of the
code routinely appreciate this change.


> - Every C compiler that I have used provides the semantics that
> developers expect: signed left-shift works the same as unsigned left
> shift. I wrote a number of undefined test cases where the compiler could
> generate better code due to the ub and no compiler did this. This was a
> while ago but I recall trying Intel CC, GCC, and LLVM.
>

We have specific optimizations that we would like to do in LLVM that would
be enabled by this but have not yet done due to priorities and needing to
cleanup the bugs that it trips over. I don't think these optimizations are
hypothetical or unimportant.


> - Non-two's complement platforms are all but nonexistent.
>

The committee has recently given serious consideration to the behavior of
C++ on unisys machines that are still in use... I'm not familiar with them,
but I hesitate to make this assumption.


> Given all of the above, I would propose strengthening the semantics of
> signed left shift even farther than Howard Hinnant, and simply
> eliminating these highly surprising undefined behaviors and instead
> specifying that the result of a signed left-shift is the same as would
> be obtained using the equivalent unsigned type.
>

I'm strongly opposed to this.

- We have found a very large number of bugs by checking left shift of
signed integers.
- There are real optimization opportunities here.
- We can provide tools to developers that allow them to find all such bugs,
most at compile time, and the rest at runtime.
- There may be non-trivial portability problems. Architectures where rotate
is significantly cheaper than shift[1] would be penalized by this change
for example.

Essentially, it satisfies all of my criteria for being good and important
UB in the language.


[1]: I don't know of any such architecture, but I don't claim exhaustive
knowledge.

Received on 2013-10-24 19:44:18