sg12: Re: [ub] type punning through congruent base class?

From: Gabriel Dos Reis <gdr_at_[hidden]>
Date: Fri, 17 Jan 2014 15:05:13 +0000

| -----Original Message-----
| From: ub-bounces_at_open-std.org [mailto:ub-bounces_at_open-std.org] On
| Behalf Of James Dennett
| Sent: Thursday, January 16, 2014 11:50 PM
| To: WG21 UB study group
| Subject: Re: [ub] type punning through congruent base class?
|
| On Thu, Jan 16, 2014 at 11:38 AM, Herb Sutter <hsutter_at_microsoft.com>
| wrote:
| > “It’s not a T until the T ctor runs” has to be part of the rules somewhere.
|
| That's be nice, but it's hard to make it compatible with C, where you
| can get a value of type T either by declaring one, or by copying an
| existing one (via assignment, or via memcpy/memmove), and maybe in
| other ways that I'm not remembering.

I think we can make that work.
I already proposed that if you memcpy or memmove into a storage with dynamic storage then you're also transferring dynamic type. (That is close to C's transfer of effective type.)
The point is that we do have the conceptual notion of constructor; the question is what the notation is? For the vast majority of the cases, I think the rule has to be what Herb is suggesting. And then, we can take care of the rest, special cases. The next thing is to address external data structures -- even C does not appear to have answers for that, despite some programmers believing it does :-)

| Certainly C code assumes that
| recreating the same bytes is sufficient to be allowed to use them as a
| T object.
|
| > It is there for non-trivially-constructible types.
| >
| > Do we have a defect here that we’re missing that rule for
| > trivially-constructible types?
|
| I think not, because (as I see it) such a rule would be a defect in
| that it would invalidate much valid C/C++ code.
|
| int *p = (int*)malloc(sizeof(int));
| *p = 42;
| *p = 0;
|
| After this, there is an int, and p points to it, and at no time was
| any constructor (trivial or otherwise) used. C doesn't have
| constructors. C has regions of memory and effective types, and the
| act of writing to a piece of memory sets its effective type.

Yes, but we have to be careful about what we mean by that. Writing to individual members of what is supposed to be a structure does not set the effective type of the structure, until an access is made as a structure.

| We could say that each assignment above constructs an int first, but
| then we'd also want to say that it destroys the previous int... except
| that there wasn't an int there until the first assignment put one
| there. It cannot be a precondition of "*p = 0" that p points at an
| int, and it can't be a precondition that it doesn't. Both worked just
| fine in C; assignment was memcpy, and gave the same object
| representation, and hence the same value. (I seem to recall reading
| C's rules and finding that they are also self-contradictory when read
| in more depth, but I certainly lack motivation to try to fix that.)

We can say that the first write to *p sets the dynamic type to int, and everything else is assignment.

C's model isn't without its own problems. I believe Xavier Leroy (the author of CompCert) worked to make sense of most of it, but last time we discussed these issues (about 2 years ago, so a lot may have happened since then) we were still trying to understand what it really means in C to have a value of structure type -- this is in connection with modelling the semantics of calling a function that accepts or returns a structure by value. Xavier is on the list, so I trust he would chime in when he gets a chance to work his way through the blizzard of messages.

| Maybe we could say that "*p = 0" ends the lifetime of the
| trivially-destructible object that was previously at p, if there was
| one (but not one of type int). Seems fragile to me, but it
| approximates what C permits.

I think this works too, and may be a way out of this mess. It definitely needs more exploring.

| I agree that what C++ says is untenable (allocating storage really
| can't start the lifetime of objects of types that aren't even
| expressed in code at the time the allocation happens), but C didn't
| distinguish between overwriting a bunch of bytes and overwriting a
| value of a type.

Yup.

| C++ approximately mirrors C in saying that "An
| object is a region of storage", but we only half mean it. We have two
| notions of what objects are -- one mostly inherited from C, and a new,
| arguably-cleaner one added by C++.

I don't think that C++ has ever had a notion of object that didn't come with a type, e.g. constructor/initialization. And that was a deliberate design decision and fundamental principle. The way I've always heard Bjarne expressed this is: a constructor turns raw memory into object, and a destructor turns an object back to mere memory. Also, see slide page 16 of his talk "Foundations of C++" at ETAPS 2012:

http://cs.ioc.ee/etaps12/invited/stroustrup-slides.pdf

| Where the two collide we get
| unpleasant quirks, like the issue raised some while ago (maybe by
| Gaby) where p->~int() is explicitly a no-op, and doesn't end the
| lifetime of the int.

Yes, that is right. We need to fix that too -- it was an unfortunate optimization. We had a sustained discussion when I brought it up, but we need to fix it.

-- Gaby

Received on 2014-01-17 16:05:29