sg12: Re: [ub] ub due to left operand of shift

From: Lawrence Crowl <Lawrence_at_[hidden]>
Date: Sun, 27 Oct 2013 21:31:23 -0700

On 10/26/13, Howard Hinnant <howard.hinnant_at_[hidden]> wrote:
> On Oct 26, 2013, at 5:16 PM, Jean-Marc Bourguet <jm_at_[hidden]> wrote:
>> On 26/10/2013 20:09, Howard Hinnant wrote:
>>> Might we reword [expr.shift]/p2 in terms of shifting 1 bits off of the
>>> most significant end, instead of in terms of E1*2^E2?
>>>
>>> The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
>>> are zero-filled. <del>If E1 has an unsigned type, the value of the result
>>> is E1 × 2^E2, reduced modulo one more than the maximum value
>>> representable in the result type. Otherwise, if E1 has a signed type and
>>> non-negative value, and E1 × 2^E2 is representable in the corresponding
>>> unsigned type of the result type, then that value, converted to the
>>> result type, is the resulting value; otherwise</del><ins>If E1 has signed
>>> type and the result would have fewer non-zero bits than E1</ins>, the
>>> behavior is undefined.

The original definition fails for the earlier example.

int adjust( unsigned char E1 ) { return E1 << 24; }

E1 is unsigned, and therefore the text says to compute mod INT_MAX rather
than UINT_MAX. This behavior is clearly not what anyone is doing.

The problem I have with the revised definition is that it is based on bits,
not values, and presumes a representation. Even if we do define a
representation, I would rather see a definition on values. For example, I
think the following meets the intent above as I understand it.

The mathematical value of E1 << E2 is E1 × 2^E2.
Its reduced value is the mathematical value
reduced modulo one more than the maximum value of
the corresponding unsigned type of the result type.
If the candidate value is representable in the result type,
the result value is the mathematical value.
Otherwise, if the result type is unsigned,
the result value is the reduced value.
Otherwise [footnote: the result type is signed],
if the mathematical value
is representable in the corresponding unsigned type of the result type,
the result value is the reduced value converted to the result type.
Otherwise, the behavior is undefined.

Note there are two other definitions that are useful to programmers.
These definitions are simpler and more programmable, but lack compatibility,
so we probably cannot change existing programs to those definitions.

Strong Unsigned and Signed:
The mathematical value of E1 << E2 is E1 × 2^E2.
If the candidate value is representable in the result type,
the result value is the mathematical value.
Otherwise, the behavior is undefined.

Strong Signed (the old definition but shift by zero defined):
The mathematical value of E1 << E2 is E1 × 2^E2.
If the candidate value is representable in the result type,
the result value is the mathematical value.
Otherwise, if the result type is unsigned,
the result value is the the mathematical value
reduced modulo one more than the maximum value of
the corresponding unsigned type of the result type.
Otherwise [footnote: the result type is signed],
the behavior is undefined.

>> If I'm not mistaken, the only case you seem to define is a left shift by
>> 0 of a negative value. (I'd not mind converting E1 to the corresponding
>> unsigned type, do the shift, then converting back to the signed type; I
>> know of two machines which have an arithmetic shift left instruction
>> with a sticky sign bit -- System/360 and PDP-10 --, and if I'm not
>> mistaken C and C++ compilers for them are using the logical shift when
>> shifting left -- someone with an access to a zSeries could check if it
>> hasn't been done when processing DR1457, the PDP-10 has been irrelevant
>> for about 3 decades).
>
> Agreed. My intent was to clean up the wording by ceasing to describe the
> result in two ways:
>
> 1. The value of E1 << E2 is E1 left-shifted E2 bit positions;
> vacated bits are zero-filled.

This is a recursive definition. What happens on little-endian machines
where the high-order bits are on the right (as conventionally read)?

> 2. If E1 has an unsigned type, the value of the result is E1 × 2^E2,
> reduced modulo one more than the maximum value representable in the result
> type. Otherwise, if E1 has a signed type and non-negative value, and E1 ×
> 2^E2 is representable in the corresponding unsigned type of the result type,
> then that value, converted to the result type, is the resulting value.
>
> We only need to describe the resulting value in one way. And a side-effect
> of this clean up is to fix what appears to me to be an obvious bug (left
> shifting a negative value by 0 should not be undefined behavior).

I agree that that wording needs to be fixed.

>
> I am also of the opinion that we should clean up p3 in the same way, and for
> the same reasons, though in this case there is no bug fix:
>
> The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an
> unsigned type or if E1 has a signed type and a non-negative value, <del>the
> value of the result is the integral part of the quotient of E1/2^E2</del>
> <ins>vacated bits are zero-filled</ins>. If E1 has a signed type and a
> negative value, the <del>resulting</del> value <ins>of the vacated bits</ins>
> is implementation-defined.

I would prefer the following.

The mathematical value of E1 >> E2 is E1 / 2^E2.
If the mathematical value is non-negative,
the result value is the floor of the mathematical value.
Otherwise [footnote: the result type is signed],
the result value is implementation-defined.

However, I think we can go further and say

The mathematical value of E1 >> E2 is E1 / 2^E2.
If the mathematical value is non-negative,
the result value is the floor of the mathematical value.
Otherwise [footnote: the result type is signed],
the result value is implementation-defined,
but bounded by the floor and ceiling of the mathematical value.

> Are these changes consistent with the existing behavior of System/360 and
> PDP-10?
>
> I can vouch that they are consistent with the behavior of clang on OS X,
> with the exception that I pointed out with respect to constexpr and
> left-shift-negative-by-zero (which again, just seems like a bug to me).
>
> I've programmatically specified what I believe should be the undefined
> behavior of the current specification modulo the left-shift-by-zero bug in
> the code below (std::__clz(x) is a count-leading-zeros function, which if we
> had one, would be another way to specify the range of well-defined
> behavior). The purpose of this exercise is to help identify and clarify any
> ambiguity that might exist in my modified English specification above.

You need asserts on the return value that do not appeal to the shift
operators.

>
> constexpr
> unsigned
> lshift(unsigned x, unsigned long long y)
> {
> assert(y < std::numeric_limits<unsigned>::digits);
> return x << y;
> }
>
> constexpr
> unsigned long
> lshift(unsigned long x, unsigned long long y)
> {
> assert(y < std::numeric_limits<unsigned long>::digits);
> return x << y;
> }
>
> constexpr
> unsigned long long
> lshift(unsigned long long x, unsigned long long y)
> {
> assert(y < std::numeric_limits<unsigned long long>::digits);
> return x << y;
> }
>
> constexpr
> int
> lshift(int x, unsigned long long y)
> {
> assert(y < std::numeric_limits<unsigned>::digits);
> assert(x == 0 || y <= std::__clz(static_cast<unsigned>(x)));
> return x << y;
> }
>
> constexpr
> long
> lshift(long x, unsigned long long y)
> {
> assert(y < std::numeric_limits<unsigned long>::digits);
> assert(x == 0 || y <= std::__clz(static_cast<unsigned long>(x)));
> return x << y;
> }
>
> constexpr
> long long
> lshift(long long x, unsigned long long y)
> {
> assert(y < std::numeric_limits<unsigned long long>::digits);
> assert(x == 0 || y <= std::__clz(static_cast<unsigned long long>(x)));
> return x << y;
> }
>
> constexpr
> unsigned
> rshift(unsigned x, unsigned long long y)
> {
> assert(y < std::numeric_limits<unsigned>::digits);
> return x >> y;
> }
>
> constexpr
> unsigned long
> rshift(unsigned long x, unsigned long long y)
> {
> assert(y < std::numeric_limits<unsigned long>::digits);
> return x >> y;
> }
>
> constexpr
> unsigned long long
> rshift(unsigned long long x, unsigned long long y)
> {
> assert(y < std::numeric_limits<unsigned long long>::digits);
> return x >> y;
> }
>
> constexpr
> int
> rshift(int x, unsigned long long y)
> {
> assert(y < std::numeric_limits<unsigned>::digits);
> return x >> y;
> }
>
> constexpr
> long
> rshift(long x, unsigned long long y)
> {
> assert(y < std::numeric_limits<unsigned long>::digits);
> return x >> y;
> }
>
> constexpr
> long long
> rshift(long long x, unsigned long long y)
> {
> assert(y < std::numeric_limits<unsigned long long>::digits);
> return x >> y;
> }

-- 
Lawrence Crowl

Received on 2013-10-28 05:31:25