C++ Logo

std-discussion

Advanced search

Re: Layout compatible classes involving [[no_unique_address]] in the Itanium ABI

From: Andrew Schepler <aschepler_at_[hidden]>
Date: Sun, 18 Oct 2020 12:48:01 -0400
I think there's been a longer standing problem with the Itanium ABI,
but maybe less significant. An important point here is that the ABI
defines "POD for the purpose of layout" in terms of a C++03
POD-struct, so that class layout doesn't depend on the C++ language
version, with the various changing definitions of POD, standard
layout, and/or layout compatible. A note giving rationale for why this
fixed definition is okay claims [
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#POD ]:

> Being tied to the TC1 definition of POD does not prevent compilers from being fully compliant with later revisions. This ABI uses the definition of POD only to decide whether to allocate objects in the tail-padding of a base-class subobject. While the standards have broadened the definition of POD over time, they have also forbidden the programmer from directly reading or writing the underlying bytes of a base-class subobject with, say, memcpy. Therefore, even in the most conservative interpretation, implementations may freely allocate objects in the tail padding of any class which would not have been POD in C++98. This ABI is in compliance with that.

However, the use of tail-padding is not the only effect of the "POD
for the purpose of layout" trait. The size, alignment, and member
offsets of any class type is "as specified by the base C API" if the
type is POD for the purpose of layout, or exactly specified by the
Itanium API if not. So as soon as C++11 allowed "layout-compatible" to
include more types than C++98 POD-structs, we have potential issues
with two layout-compatible class types, with just one of them POD for
the purpose of layout:

#include <cassert>
struct A { char ac; int ai; };
class B { char bc; int bi; };
union U { A a; B b; };

int main() {
    U u;
    u.a.ai = 2;
    assert(u.b.bi == 2);
}

This is fine if we can assume that the C API gives the same member
offsets as the Itanium ABI for sufficiently simple types (no vptrs,
for a start). And that probably is in fact the case, practically
speaking, since finding each member's minimum offset which doesn't
overlap an earlier member and is aligned is the most obvious
algorithm. But if it might not be, the above well-formed C++ program
could behave badly.

The other apparent purpose here, in involving the C ABI, is to allow
an implementation to guarantee some amount of compatibility of structs
when C code and C++ code is linked together. I don't know the exact
guarantee, if it's even written out anywhere, but maybe something like
"given a C struct or union type CT and a C++ class type CXXT, if a
hypothetical type T in the common subset of C89 and C++03 exists where
T is compatible with CT per the C version used and T is
layout-compatible with CXXT per the C++ version used, then an object
of type CT initialized by C code can be accessed via type CXXT in C++
code, and an object of type CXXT created by C++ code can be accessed
via type CT in C code." Likely some caveats for strict aliasing rules
would also apply.

The difference which comes up in the [[no_unique_address]] case is the
Itanium ABI's "dsize", the bytes of an object occupied by data, where
the final (sizeof(T)-dsize(T)) padding bytes of a
potentially-overlapping object can be used for other subobjects. For a
type POD for the purpose of layout, dsize(T)==sizeof(T), but otherwise
the Itanium rules may set dsize(T)<sizeof(T). This leads to the
difference dsize(A1)!=dsize(A2), and so
offsetof(B1,ch)!=offsetof(B2,ch).

Fixing the Itanium ABI is tricky, then, attempting both to comply with
C++ layout-compatible requirements (which have frequently expanded in
definition) and to keep the C-language compatibility guarantee. Since
it was already relying on the fact that the Itanium member layout
rules for non-polymorphic class types match the C ABI's member
layouts, perhaps Itanium could just explicitly require those C ABI
rules match its definition. Or, maybe instead of basing "POD for the
purpose of layout" on the smallest possible set of class types (C++03
POD-struct and POD-union), base it on the widest possible set of
types: types not requiring any vptr.

For example, let's say a "language-compatible" type is a scalar type,
or an array of language-compatible type elements, or a class type with
no virtual bases, no virtual functions, and no non-static data members
with non-language-compatible type. And for a non-polymorphic non-union
class type T, the "potentially common initial sequence" of T its
direct base classes, followed by the largest possible initial sequence
of direct non-static data members of T which are language-compatible.
The subobject offsets for the elements of the potentially common
initial sequence are the same as the C ABI defines for a struct whose
members have those types, with all subobjects of subobjects
transformed similarly and recursively. Also, the initial align(T) as
used by the Itanium layout algorithm is the alignment of that struct
type. If the C ABI doesn't define rules for empty types (e.g. for an
extension to the C language allowing a struct with no members), use
only the non-empty elements of the potentially common initial sequence
in this C struct, then find offsets for the empty elements as Itanium
already specifies. Next initialize dsize(T) with the maximum of
offsetof(T,o)+dsize(o) ranging over the subobjects o of T. And
finally, process any non-static data members of T which are not in the
potentially common initial sequence as per the existing Itanium rules.
There might be errors or missing details in this description, but
that's the idea.

-- Andrew Schepler



On Wed, Oct 14, 2020 at 5:55 PM Lénárd Szolnoki via Std-Discussion
<std-discussion_at_[hidden]> wrote:
>
> Hi,
>
> On Wed, 14 Oct 2020 11:20:17 -0400
> Jason McKesson via Std-Discussion <std-discussion_at_[hidden]>
> wrote:
>
> > The language makes sense. Changing it as you suggest would make it
> > make less sense. This is a problem caused by the Itanium ABI; it
> > should be solved *by* the Itanium ABI. Whether they do it by treating
> > private members correctly in accord with the standard, or by doing
> > what you suggest, it's on them to implement the standard as it
> > currently is.
>
> Yes the language makes sense, it's a simple rule for common initial
> sequences.
>
> The Itanium ABI also makes sense, it uses the same layout rules for
> [[no_unque_address]] members as for base subobjects. It allows library
> implementations to switch from EBO to [[no_unique_address]] without
> changing layout.
>
> Both of these were the intent of the [[no_unique_address]] proposal[1],
> as described in its FAQ:
>
> > Q: Can a standard library switch from EBO to this attribute without an
> > ABI break?
> > The intent is that an ABI can specify the same layout rule for a
> > member with the attribute as it does for a base class. In an ABI that
> > makes that choice, yes.
> > Q: Does this allow reuse of tail padding? (Eg, three bytes at the end
> > of struct A { int n; char c; };)
> > The general rule is that the layout is just like for a base class.
> > Tail padding reuse is permitted for base classes, so it's also
> > permitted for members with the attribute.
> > ...
> > Q: Does the attribute affect whether a type is standard-layout?
> > No.
> > Q: Does the attribute affect the "common initial sequence" rule?
> > Yes. For two structs to be considered to have a common initial
> > sequence, their initial sequences of common members must make
> > consistent use of the attribute.
>
> In short we have two conflicting intents here:
> 1. Allowing ABIs to use the same layout rules for [[no_unique_address]]
> members and base class subobjects.
> 2. Allowing [[no_unique_address]] members in common initial sequences
> (at arbitrary position).
>
> We can't have both. The current wording chooses the second option, but
> it doesn't reflect all the intents of the proposal, as it can't.
>
> I think it's also worth submitting this as a core language issue. I
> never submitted one so I wanted to float the problem here first. I
> would be glad if more people chimed in with their opinion.
>
> [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0840r2.html
> --
> Std-Discussion mailing list
> Std-Discussion_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-discussion

Received on 2020-10-18 11:48:16