C++ Logo

sg12

Advanced search

Re: [ub] new revision of p0593

From: Richard Smith <richardsmith_at_[hidden]>
Date: Fri, 09 Feb 2018 21:57:30 +0000
On Fri, 9 Feb 2018 at 13:08, Myria <myriachan_at_[hidden]> wrote:

> Is it worth mentioning that an implementation may have other
> mechanisms that create storage in the manner of malloc? For example,
> it'd make sense for VirtualAlloc or mmap to create implicit objects
> just like the standard malloc functions.
>

Sounds like a good change to me.


> In terms of pointer arithmetic, what becomes defined as a result?
> Does the following evil code work?
>
> struct X { int a; int b; int c; };
> int does_this_return_4() {
> alignas(X) std::byte s[sizeof(X)];
> X *x = reinterpret_cast<X *>(s);
> x->c = 4;
> return (&x->a)[2];
> }
>

No objects are created between the line referencing x->c and the line
referencing x->a. For the x->a line to be valid, there must be an array of
ints within its lifetime containing at least three ints starting at &x->a.
For the x->c line to be valid, there must be an int object within its
lifetime named by x->c. So, if the function has defined behavior, we can
conclude that the X object and the int[3] object have overlapping storage
and lifetime, which means one of those objects must be nested within the
other. But we know that neither can be a subobject of the other (both
objects only have subobjects of type `int`), and neither provides storage
for the other (neither transitively contains an array of char-like type).
So we arrive at a contradiction and can conclude that the behavior must be
undefined.

I'd note that if we want optimizations similar to GCC's and Clang's
path-sensitive TBAA to be valid (in particular, if we can conclude that a
store to a "c member of X" cannot alias a load of "index >=2 of int[]"),
the above must be UB. As usual, there's a balance to be had here between
allowing evil-but-potentially-meaningful code and allowing
useful-but-potentially-overly-aggressive optimizations.

The below is a silly idea I had that is admittedly extreme, but would
> preserve a lot of type-based alias analysis.
>
> A simpler memory model that might work is defining memory as scattered
> arrays of bytes: Each byte has metadata specifying either "none" or a
> non-byte (non-char/unsigned char/std::byte) scalar type and an offset
> into that scalar type. Writing to a non-byte scalar changes the
> metadata for those bytes to the type that is written. Reading scalars
> as a non-byte type requires that all bytes have either "none" or a
> "compatible" type with incrementally-increasing offsets starting at 0.
> Writing byte types sets the type of those bytes to "none". Any byte
> may be read as a byte type. Pointer arithmetic would be valid so long
> as the pointer does not cross the byte array that was allocated,
> except that a pointer may point one past the end of such a byte array.
>
> Classes in this model would not factor into the type system at all;
> instead, for purposes of the memory model, members of a class would
> just be offsets into the byte array representing the class instance.
> This would preserve such semantics as reading sockaddr::sa_family from
> what was written as sockaddr_in6::sin6_family. It would also allow a
> lot of other shenanigans we probably don't want to encourage.
>

That certainly seems to allow all the programs I could imagine wanting to
allow, and does still allow simple scalar TBAA (but not path-sensitive
TBAA, nor more sophisticated optimizations such as narrowing the accessible
portion of an object based on access path). The above is spiritually pretty
similar to C's "effective type" rule.

My general goal with this sequence of papers (p0137 and now p0593) has been
to try to reduce the grey area between "clearly valid" and "clearly UB",
into which both well-intentioned programs and well-intentioned optimizers
and static and dynamic analysis tools often tread, down to a much finer
dividing line that can be reasonably explained and understood, with escape
hatches where necessary so people can still express what they need to
express. But I think the above approach strays a bit too far towards
permissiveness -- most people don't write evil code that needs that kind of
rule most of the time, and setting the rule up that way means that most
code will be paying for flexibility it doesn't need, violating a
fundamental tenet of C++.

I think we probably do want additional language support to make sockaddr's
shenanigans work (leaving this to the implementation to sort out doesn't
seem like the best approach, although it might be tempting). I personally
don't have a solid idea of what shape that would take, though. And I think
it is reasonable to expect the definition of sockaddr to be changed to
support this (perhaps adding some annotation or attribute).


> Melissa
>
> On Fri, Feb 9, 2018 at 11:52 AM, Richard Smith <richardsmith_at_[hidden]>
> wrote:
> > Hi all,
> >
> > Please find attached a revised version of P0593 based on the excellent
> > discussion and feedback at the Albuquerque meeting. Please let me know if
> > you have any comments; I believe our plan was to discuss this again at
> > Jacksonville, and all being well, to forward it to EWG at that meeting.
> >
> > Best regards,
> > Richard
> >
> > _______________________________________________
> > ub mailing list
> > ub_at_[hidden]
> > http://www.open-std.org/mailman/listinfo/ub
> >
> _______________________________________________
> ub mailing list
> ub_at_[hidden]
> http://www.open-std.org/mailman/listinfo/ub
>

Received on 2018-02-09 22:57:44