sg12: Re: [ub] type punning through congruent base class?

From: Gabriel Dos Reis <gdr_at_[hidden]>
Date: Thu, 16 Jan 2014 22:58:16 +0000

Again, just to reassure you: we are dealing with massive real world existing codes on our side :-)

-- Gaby

From: ub-bounces_at_[hidden] [mailto:ub-bounces_at_[hidden]] On Behalf Of Richard Smith
Sent: Thursday, January 16, 2014 2:55 PM
To: WG21 UB study group
Subject: Re: [ub] type punning through congruent base class?

On 16 January 2014 14:39, Gabriel Dos Reis <gdr_at_[hidden]<mailto:gdr_at_[hidden]>> wrote:
I had always assumed that you were always mindful of that paragraph – which explains part of my puzzlement at your previous claims. It is so central the entire C++ object model. Thanks for clarifying.

We may have to tweak this paragraph, but I don’t think the root cause is here. It is elsewhere – the very paragraph under discussion in previous messages.

By the way, my concern isn’t driven by theory. I would like to make sure that future high quality C++ implementations can actually include instrumentation of lifetime and aliasing issues – the way implementations currently do with “address sanitizers”, “signed integer arithmetic overflow”, etc.

That's also my goal; I started digging into this precisely because I was designing an object lifetime sanitizer -- this very rapidly led to the observation that the model described in 1.8/1 is untenable as a basis for such a sanitizer, because it does not match the model used by programmers and implementers -- such a sanitizer would diagnose problems in a large quantity of real code, in which people assume they can obtain appropriate storage any way they like, and use it as an object of type T, so long as T is suitably trivial.

From: ub-bounces_at_[hidden]<mailto:ub-bounces_at_[hidden]> [mailto:ub-bounces_at_[hidden]<mailto:ub-bounces_at_[hidden]>] On Behalf Of Richard Smith
Sent: Thursday, January 16, 2014 1:48 PM

To: WG21 UB study group
Subject: Re: [ub] type punning through congruent base class?

On 16 January 2014 13:10, Herb Sutter <hsutter_at_microsoft.com<mailto:hsutter_at_[hidden]>> wrote:
> > | > Well, as far as I understand, trivial constructors might not be called at all.
> > |
> > | Maybe "might not be called" is part of the bug then. Isn't the right
> > | model that trivial ctors are called, but "might do nothing"?
> >
> > Yes, agreed.
>
> That may be part of a proposed model, but it's not part of the existing model,
> which (for compatibility) is based on C's model, where fields of a (necessarily
> trivially-constructible) struct can be written to as soon as suitable memory for
> that struct is obtained.
Ah, I see the problem, thanks James and Daveed. Let me try to summarize what I've learned so far.

In C++, "storage" and "lifetime" don't mean the same thing for a non-trivially-constructible type.

But in C the word "lifetime" has a different meaning than C++. I have C99 handy, but it should be the same in C11, and "lifetime" and "storage duration" are the same thing in C:

        6.2.4/1: An object has a storage duration that determines its lifetime. There are three storage durations: static, automatic, and allocated. Allocated storage is described in 7.20.3.

        6.2.4/2: The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it.

        7.20.3: The order and contiguity of storage allocated by successive calls to the calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object. The pointer returned points to the start (lowest byte address) of the allocated space.

But one thing that follows here is that in C, "the object" is the raw memory. Right? This follows from the above three quotes.

So is that different in C++?

There's one other rule that I'd overlooked in my prior messages. [intro.object]/1 presents this beautiful model: "An object is a region of storage. [...] An object is created by a definition (3.1), by a new-expression (5.3.4) or by the implementation (12.2) when needed.[...] An object has a storage duration (3.7) which influences its lifetime (3.8). An object has a type (3.9)."

Trouble is, this model gives a vast quantity of real-world C++ code undefined behavior, including any C-like code that uses the 'malloc' technique described in this thread. This breaks compatibility with C and with real-world C++ so thoroughly that I think it's not a tenable position, and that the rule must be revised. Prior to this thread, I had assumed that we would easily reach consensus on that, but it seems that's not the case.

As Daveed pointed out:

> > Am I missing something?
>
> I think so: 3.8/1 also says that the lifetime of an object ends if its storage is
> reused
"...or released" -- okay. I always thought reused meant deallocated and reallocated, but since it says "reused or released" it does seem to follow that reused means what Daveed said:

> . So when you write to *ps, you've terminated the lifetime of *pb.
OK, that's C++. The word "reused" doesn't appear anywhere in the C99 standard at least.

Trying to summarize:

So was the goal of the 3.8/1 wording of "reuse" to acknowledge (and be compatible with) the C memory-scribbling, while still being able to say that even for raw storage and PODs a given piece of storage has a well-defined type at any given point in time (namely the one it was last written to as)? And does "reused" clearly say that "written-to" part if that's the intent?

And so for raw memory and PODs, storage as a type T begins when you read or write it as a T, and ends when you read or write as some other type U, as long as T and U are PODs that fit in the storage (incl. size and alignment)?

And dare I ask: Um, what about using part of the same allocated buffer to hold a T and another part to hold a U, such as malloc(1000) and read/write an int at offset 10 and read/write a short at offset 314 -- they're the same allocation, so do we even have any way to talk about that?

Herb

_______________________________________________
ub mailing list
ub_at_[hidden]<mailto:ub_at_[hidden]en-std.org>
http://www.open-std.org/mailman/listinfo/ub

_______________________________________________
ub mailing list
ub_at_[hidden]<mailto:ub_at_[hidden]n-std.org>
http://www.open-std.org/mailman/listinfo/ub

Received on 2014-01-16 23:58:23