sg12: Re: [ub] type punning through congruent base class?

From: Kazutoshi Satoda <k_satoda_at_[hidden]>
Date: Sun, 19 Jan 2014 03:23:15 +0900

On 2014/01/17 13:58 +0900, Gabriel Dos Reis wrote:
> Kazutoshi Satoda <k_satoda_at_[hidden]> writes:
> | On 2014/01/17 5:53 +0900, Gabriel Dos Reis wrote:
> | > On 2014/01/17 5:01 +0900, James Dennett wrote:
> | > | struct B { int x };
> | > | struct B* p = (B*) malloc(sizeof(B));
> | > | p->x = 17;
> | > |
> | > | This (modulo the cast) has been how C has handled dynamic allocation
> | > | of structs approximately forever. There are no constructors in C,
> | > | structs don't get initialized, their fields just get assigned to.
(snip)
> | "p->x = 17" makes the effective type of the object between &p->x and
> | (&p->x + 1) becomes int (6.5 p6), and stores value 17 (a value
> | representation of int which represents 17) into that object (6.5.16).
>
> Agreed. In particular, at this point, we still do not have an object of
> type B -- from C's perspective.
>
> | (Note)
> | In C, there is no such a thing like "an object of type int" unlike in
> | C++ where an object has a type and the type is determined when the
> | object is created. In C, "object" is a merely a region of data storage,
> | and may be labeled by an effective type which is determined by ongoing
> | or previous access to the object at a time.
>
> Yes. This goes back to the comment I made:
>
> # Even if we take the C model, it is not clear what is there. We can
> # infer that after the statement 'p->x = 17;' the storage at address
> # &p-x' has effective type 'int', but that does not say much about the
> # rest. I -suspect- C is OK with that because fundamentally, its object
> # model is structural.

There is an example which (I think) illustrates a real-world problem
caused by this assertion "the effective type of the object is just int
and not B." Please see this code and the result with gcc 4.8.2 (-O2).
http://melpon.org/wandbox/permlink/BJQkgZDmbsHEp71j

  struct A { int a; };
  struct B { int a; double b; };

  int f(struct A* a, struct B* b)
  {
    b->a = 123;
    a->a = 456;
    return b->a;
  }

Here, the compiler performs an optimization and returns constant 123
from f(), assuming that two unrelated struct types never alias even if
they have common initial members. I think this optimization is intended
to be allowed by the aliasing rule in both C and C++.

But to allow this optimization in the above case, it is required to be
OK that a compiler can assume the type of object at b is B, and the type
of object at a is A, provided just the two assignment and one read to
their member.

Then, it seems a de facto rule that an evaluation of member access
operator solely (even when no actual access follow) implies the dynamic
type (effective type in C) of the object designated by the left operand
("the object expression" in C++ term) to be a type which satisfies the
aliasing rule as if the object is accessed by the static type of the
object expression.

Is this rule acceptable to be in the standard? For C++, it would be like
that:

  Change 3.10 p10 (C++) from:
    If a program attempts to access the stored value of an object
    through a glvalue of other than one of the following types the
    behavior is undefined:
  to
    If a program attempts to access the stored value of an object
    <ins>or evaluates a member access expression of non-static member
    (including interpretation of overloaded operators)</ins> through
    a glvalue of other than one of the following types the behavior is
    undefined:

And define "reuse" to include (along with assignment to a glvalue of a
scalar type, memcpy, etc) an evaluation of member access expression for
trivially constructible class types, to take such a evaluation as a
start of object lifetime with a certain type.

With these rules, the above optimization is said to be OK because if
"a->a = 456" was reusing the storage at b, the following access "return
b->a" is undefined as it read from an object of type B which was killed
by the reuse. Thus a compiler can assume that "a->a = 456" is not
reusing b.

(going back to analysis in C)
> | If "struct B x = *p" follows the above code, the access (lvalue
> | conversion, 6.3.2.1 p2) on *p, which is allowed by the aliasing rule
> | (6.5 p7, bullet 5 "an aggregate or union ..."), changes the effective
> | type of the object at p.
(snip)
> Bonus points: after
>
> struct B x = *p;
>
> what is the effective type of the object at p and &p->x? B and int or B
> and B? Can we even give a meaning to the question?
...(from my follow up)
> | Uh, sorry. I was wrong here. Non-modifying access to an object doesn't
> | change the effective type of subsequent access to the object. And the
> | aliasing rule does not apply here.
>
> Aha, but what about 6.5/6
>
> [...] For all other accesses to an object having no declared type,
> the effective type of the object is simply the type of the lvalue use
> for the access.
>
> ?

A storing access determines the effective type for subsequent
non-modifying accesses. And such subsequent non-modifying accesses do
not drop into "all other accesses", I think.

Then, after "struct B x = *p", the effective type of the object at p
and &p->x are both still int.

-- 
k_satoda

Received on 2014-01-18 19:23:18