C++ Logo

std-discussion

Advanced search

Re: UB in P2641 'Checking if a union alternative is active'

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Tue, 20 Jun 2023 08:27:31 +0200
On 20/06/2023 04.11, Matthew House via Std-Discussion wrote:
> On Mon, Jun 19, 2023 at 8:46 PM Brian Bi via Std-Discussion
> <std-discussion_at_[hidden]> wrote:
>>> More generally, such an interpretation would completely break
>>> mechanisms enabled by [basic.lval]/11 using pointers that happen to
>>> refer to union members, since inactive union members would always have
>>> preference over reinterpretations allowed by the rule. For instance,
>>> suppose that u.c were declared as an unsigned char instead of a char.
>>> Then, std::memcpy(dest, &u.i, sizeof(int)) would be UB, since by
>>> reinterpreting its argument as an array of unsigned char, memcpy would
>>> produce a pointer to u.c, then read past its end. I don't think that's
>>> something that can be considered reasonable.
>>
>> Well, `std::memcpy` can be defined by magic to do the right thing, but I guess you're talking about a user-written analogue. Still, I don't understand your argument. Under the current wording, you can't write such a thing and have it have well-defined behavior according to the letter of the law, regardless of what view you take on whether the `u.c` object exists when it's not active.
>
> I'll admit, I don't understand the argument that P1839 seems to hinge
> on, to argue that even reading the first byte of the object
> representation is UB:
>
>> When a is dereferenced, the behaviour is undefined as per [expr.pre]
>> p4 because the value of the resulting expression would not be the
>> value of the first byte, but the value of the whole int object
>> (123456), which is not a value representable by unsigned char.

In recent years, we've come to understand better that "the object the pointer
points to" may be different from "the pointee of the type of the pointer".

For example, when casting a point to T to a pointer to void, the pointer
still points to a T object, although the type of the expression doesn't
say so. Or, by chaining two static_casts, you can actually have a pointer
of type "pointer to char" have a value that actually points to an object of
type int.

> This interpretation appears to defeat the entire purpose of
> the first sentence in [basic.lval]/11, which I will repeat here for
> reference:
>
>> If a program attempts to access (3.1) the stored value of an object
>> through a glvalue whose type is not similar (7.3.6) to one of the
>> following types the behavior is undefined:
>> - the dynamic type of the object,
>> - a type that is the signed or unsigned type corresponding to the
>> dynamic type of the object, or
>> - a char, unsigned char, or std::byte type.

> I have always imagined this as implying a series of steps for
> performing a read where the type of the glvalue is not similar to the
> dynamic type of the object:
> 1. Locate the object referred to by the glvalue.
> 2. Select the appropriate bytes in the object representation.

That's exactly the problem: There is no talk about "object representation"
in the existing text here.

> 3. Interpret those bytes as a value of the glvalue's type.
> 4. Return the resulting value.
> (The reverse process would occur for a modification.)
>
> Indeed, [basic.lval]/11 originates from an analogous clause in
> standard C, which at another point explicitly clarifies the supremacy
> of the lvalue's type: "The meaning of a value stored in an object or
> returned by a function is determined by the *type* of the expression
> used to access it."

We can't have this in C++, because you (always) could have a pointer-to-
base class refer to an object that is actually a of a derived class type.

> But C++ isn't so clear about the result when an object is
> reinterpreted as another type using [basic.lval]/11. Apart from
> [basic.lval]/11 itself, the most relevant language I could find is in
> [conv.lval]/3:
>
>> The result of the conversion is determined according to the
>> following rules:
>> [...]
>> - Otherwise, the object indicated by the glvalue is read (3.1), and
>> the value contained in the object is the prvalue result.
>
> "The object indicated by the glvalue" is surely the object that the
> glvalue refers to, but "the value contained in the object" is somewhat
> ambiguous, especially since the clause references no mechanism for
> converting to the glvalue's type. Is "the value contained" exactly the
> value of the object in its dynamic type? Or is "the value contained"
> the value resulting from interpreting the object representation as a
> value of the glvalue's type? P1839 briefly assumes the former, but I
> don't see how that interpretation can be squared with the purpose of
> [basic.lval]/11.

See, it's not so easy.

> (If we were to make the second interpretation explicit in
> [conv.lval]/3 and [expr.ass]/2, it would obviate the first problem
> brought up in the paper. Yet the problem of allowing pointer
> arithmetic with an unsigned char* pointer to a general object would
> remain. But the paper's proposal seems quite ugly to me; in my view,
> this would be more cleanly solved by introducing a new kind of
> pointer, ...

The goal is to make code that "should work" (because it has worked
in C and C++ forever) just work by putting a suitable model underneath
it, not to introduce new kinds of pointers (which would not help
existing code).

Jens

Received on 2023-06-20 06:27:35