Date: Wed, 24 Jul 2013 15:10:43 -0700
On Wed, Jul 24, 2013 at 11:46 AM, Richard Smith <richardsmith_at_[hidden]> wrote:
> On Wed, Jul 24, 2013 at 10:27 AM, Jeffrey Yasskin <jyasskin_at_[hidden]>
> wrote:
>>
>> On Wed, Jul 24, 2013 at 8:43 AM, Howard Hinnant
>> <howard.hinnant_at_[hidden]> wrote:
>> > On Jul 23, 2013, at 10:36 PM, Lawrence Crowl <Lawrence_at_[hidden]> wrote:
>> >
>> >>> This isn't an academic argument. If we can keep the compiler from
>> >>> optimizing away the following example code (which has UB from what
>> >>> I've been
>> >>> reading here), we make some parts of the std::library components 2 to
>> >>> 3
>> >>> times faster. That's a much better optimization than what the
>> >>> compilers are
>> >>> getting from actively exploiting UB.
>> >>>
>> >>> #include <utility>
>> >>> #include <string>
>> >>>
>> >>> template <class T, class U>
>> >>> struct pair
>> >>> {
>> >>> T first;
>> >>> U second;
>> >>> };
>> >>>
>> >>> template <class T, class U>
>> >>> union V
>> >>> {
>> >>> typedef pair< T, U> NPair;
>> >>> typedef pair<const T, U> CPair;
>> >>> NPair n;
>> >>> CPair c;
>> >>>
>> >>> V() : c() {}
>> >>> V(const T& t) : c{t} {}
>> >>> V(const T& t, const U& u) : c{t, u} {}
>> >>> V(const V& v) : c(v.c) {}
>> >>> V& operator=(const V& v) {n = v.n; return *this;}
>> >>> V(V&& v) : n(std::move(v.n)) {}
>> >>> V& operator=(V&& v) {n = std::move(v.n); return *this;}
>> >>> ~V() {c.~CPair();}
>> >>> };
>> >>>
>> >>> struct X
>> >>> {
>> >>> std::string s;
>> >>> };
>> >>>
>> >>> struct Y
>> >>> {
>> >>> std::string s;
>> >>> };
>> >>>
>> >>> int
>> >>> main()
>> >>> {
>> >>> typedef V<X, Y> Z;
>> >>> Z z1(X{"some "}, Y{"text"});
>> >>> Z z2 = std::move(z1);
>> >>> }
>> >>
>> >> Can you explain your reasoning here?
>> >>
>> >> BTW, I think there is potential optimization in knowing that a value
>> >> will not change, and so writing into a const variable strikes me as
>> >> being locally optimistic and globally pessimistic.
>> >
>> > This is a crude model of std::map<T, U>::value_type that would allow
>> > map's copy assignment operator to recycle nodes, instead of deleting all
>> > nodes in the lhs prior to inserting new ones. Here is a an example chart of
>> > the performance of the copy assignment of a map<int, int> when the rhs has
>> > 1000 values, the lhs has between 0 and 2000 values (prior to the
>> > assignment), and the implementation is varied between a simple
>> > clear()/insert(), a structural copy (avoids rebalancing the lhs at each
>> > insert), and recycling nodes using something very similar to what is
>> > sketched above with the union V.
>> >
>>
>> Thanks for the concrete use case.
>>
>> There's always the hack of destroying and re-constructing a node when
>> you want to re-use its memory without going through the global
>> allocator. Then you have to figure out what you need to do with
>> pointers to keep them valid. e.g. is "p->~T(); new(p) T(...); use(p);"
>> valid, or must one write "p->~T(); p = new(p) T(...); use(p);"? (note
>> the extra "p=") [basic.life]p7 addresses this ... and says that for
>> class types with a const-qualified data member, you do need to
>> re-assign the pointer. (C++03 through C++14)
>
>
> Right, under the current standard, that is the way to recycle the memory and
> avoid undefined behavior.
>
>> Can you try that approach and see if the no-op assignments cost
>> measurable performance? If they do, probably a clang or llvm bug would
>> be in order.
>>
>> However, I agree that this is more subtle than I'm happy with. We
>> already have the notion of layout-compatible types, but T and const T
>> aren't layout-compatible (C++14[basic.types]p11). What would go wrong
>> if we said that cv-qualified versions of T were layout-compatible with
>> T?
>
>
> It's important to understand the goal of [basic.life]p7 here. It's easiest
> to think about this rule from an implementer's perspective.
>
> From a programmer's perspective, this rule says that if you have a pointer
> or reference to an object, and you destroy the object and create a new one
> in the same place, that old pointer or reference can only be used to access
> the new object if the old and new types match and don't contain any const
> subobjects or reference subobjects.
>
> From an implementer's perspective, this rule says that you can fold together
> a common subexpression that loads from the same position in the same object,
> if you're loading a const data member, a reference member, or a vptr. (The
> vptr case in particular is important for performance, since it allows a lot
> more reasoning about the dynamic type of an object, and thus more inlining
> of virtual function calls.)
>
> If we permitted const and non-const types to be layout-compatible, we'd
> break [basic.life]p7's guarantees and undermine its intent, and an
> implementation would no longer be able to perform common subexpression
> elimination on loads of the same const data member of the same object.
>
> Also, the common initial sequence rules only permit "inspect[ing] the common
> initial part". So either the active member is pair<T,U> and users of the map
> can't modify the .second of values through a pair<const T,U>, or the active
> member is pair<const T,U> and Howard's recycling trick does not work because
> .first is const. We would require additional changes to the common initial
> sequence rules to allow this; changing "layout-compatible" is insufficient.
C's direction in
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm seems to be
that changing the active member of a union (or likely type-punning
through a union) requires using the union object itself. Given that, I
think loosening the layout-compatible and "initial sequence" rules to
allow Howard's optimization would be fine for implementations, since
they could still fold together const-using common subexpressions
derived from the same non-union pointer. Howard's code would need to
read the address of the intended-to-be-active field from the union
again before storing it into a non-union pointer.
On the other hand, if I'm still wrong, can you think of a good way for
Howard to write his code so it gets the same optimization? He's right
that some types can achieve much more efficient assignment than they
can destruction+re-construction.
Jeffrey
P.S. I'd like to point out that the resolution of C's DR236 is very
bad. The original issue presents an argument that the spec as written
allows the active member to be changed using simple pointers. The
committee resolution says that wasn't intended, but neither presents
an argument that the existing spec actually doesn't allow it, nor
changes the spec to express the intention. We shouldn't be citing a DR
to explain how unions work in C11; we should be able to cite C11
itself.
> On Wed, Jul 24, 2013 at 10:27 AM, Jeffrey Yasskin <jyasskin_at_[hidden]>
> wrote:
>>
>> On Wed, Jul 24, 2013 at 8:43 AM, Howard Hinnant
>> <howard.hinnant_at_[hidden]> wrote:
>> > On Jul 23, 2013, at 10:36 PM, Lawrence Crowl <Lawrence_at_[hidden]> wrote:
>> >
>> >>> This isn't an academic argument. If we can keep the compiler from
>> >>> optimizing away the following example code (which has UB from what
>> >>> I've been
>> >>> reading here), we make some parts of the std::library components 2 to
>> >>> 3
>> >>> times faster. That's a much better optimization than what the
>> >>> compilers are
>> >>> getting from actively exploiting UB.
>> >>>
>> >>> #include <utility>
>> >>> #include <string>
>> >>>
>> >>> template <class T, class U>
>> >>> struct pair
>> >>> {
>> >>> T first;
>> >>> U second;
>> >>> };
>> >>>
>> >>> template <class T, class U>
>> >>> union V
>> >>> {
>> >>> typedef pair< T, U> NPair;
>> >>> typedef pair<const T, U> CPair;
>> >>> NPair n;
>> >>> CPair c;
>> >>>
>> >>> V() : c() {}
>> >>> V(const T& t) : c{t} {}
>> >>> V(const T& t, const U& u) : c{t, u} {}
>> >>> V(const V& v) : c(v.c) {}
>> >>> V& operator=(const V& v) {n = v.n; return *this;}
>> >>> V(V&& v) : n(std::move(v.n)) {}
>> >>> V& operator=(V&& v) {n = std::move(v.n); return *this;}
>> >>> ~V() {c.~CPair();}
>> >>> };
>> >>>
>> >>> struct X
>> >>> {
>> >>> std::string s;
>> >>> };
>> >>>
>> >>> struct Y
>> >>> {
>> >>> std::string s;
>> >>> };
>> >>>
>> >>> int
>> >>> main()
>> >>> {
>> >>> typedef V<X, Y> Z;
>> >>> Z z1(X{"some "}, Y{"text"});
>> >>> Z z2 = std::move(z1);
>> >>> }
>> >>
>> >> Can you explain your reasoning here?
>> >>
>> >> BTW, I think there is potential optimization in knowing that a value
>> >> will not change, and so writing into a const variable strikes me as
>> >> being locally optimistic and globally pessimistic.
>> >
>> > This is a crude model of std::map<T, U>::value_type that would allow
>> > map's copy assignment operator to recycle nodes, instead of deleting all
>> > nodes in the lhs prior to inserting new ones. Here is a an example chart of
>> > the performance of the copy assignment of a map<int, int> when the rhs has
>> > 1000 values, the lhs has between 0 and 2000 values (prior to the
>> > assignment), and the implementation is varied between a simple
>> > clear()/insert(), a structural copy (avoids rebalancing the lhs at each
>> > insert), and recycling nodes using something very similar to what is
>> > sketched above with the union V.
>> >
>>
>> Thanks for the concrete use case.
>>
>> There's always the hack of destroying and re-constructing a node when
>> you want to re-use its memory without going through the global
>> allocator. Then you have to figure out what you need to do with
>> pointers to keep them valid. e.g. is "p->~T(); new(p) T(...); use(p);"
>> valid, or must one write "p->~T(); p = new(p) T(...); use(p);"? (note
>> the extra "p=") [basic.life]p7 addresses this ... and says that for
>> class types with a const-qualified data member, you do need to
>> re-assign the pointer. (C++03 through C++14)
>
>
> Right, under the current standard, that is the way to recycle the memory and
> avoid undefined behavior.
>
>> Can you try that approach and see if the no-op assignments cost
>> measurable performance? If they do, probably a clang or llvm bug would
>> be in order.
>>
>> However, I agree that this is more subtle than I'm happy with. We
>> already have the notion of layout-compatible types, but T and const T
>> aren't layout-compatible (C++14[basic.types]p11). What would go wrong
>> if we said that cv-qualified versions of T were layout-compatible with
>> T?
>
>
> It's important to understand the goal of [basic.life]p7 here. It's easiest
> to think about this rule from an implementer's perspective.
>
> From a programmer's perspective, this rule says that if you have a pointer
> or reference to an object, and you destroy the object and create a new one
> in the same place, that old pointer or reference can only be used to access
> the new object if the old and new types match and don't contain any const
> subobjects or reference subobjects.
>
> From an implementer's perspective, this rule says that you can fold together
> a common subexpression that loads from the same position in the same object,
> if you're loading a const data member, a reference member, or a vptr. (The
> vptr case in particular is important for performance, since it allows a lot
> more reasoning about the dynamic type of an object, and thus more inlining
> of virtual function calls.)
>
> If we permitted const and non-const types to be layout-compatible, we'd
> break [basic.life]p7's guarantees and undermine its intent, and an
> implementation would no longer be able to perform common subexpression
> elimination on loads of the same const data member of the same object.
>
> Also, the common initial sequence rules only permit "inspect[ing] the common
> initial part". So either the active member is pair<T,U> and users of the map
> can't modify the .second of values through a pair<const T,U>, or the active
> member is pair<const T,U> and Howard's recycling trick does not work because
> .first is const. We would require additional changes to the common initial
> sequence rules to allow this; changing "layout-compatible" is insufficient.
C's direction in
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm seems to be
that changing the active member of a union (or likely type-punning
through a union) requires using the union object itself. Given that, I
think loosening the layout-compatible and "initial sequence" rules to
allow Howard's optimization would be fine for implementations, since
they could still fold together const-using common subexpressions
derived from the same non-union pointer. Howard's code would need to
read the address of the intended-to-be-active field from the union
again before storing it into a non-union pointer.
On the other hand, if I'm still wrong, can you think of a good way for
Howard to write his code so it gets the same optimization? He's right
that some types can achieve much more efficient assignment than they
can destruction+re-construction.
Jeffrey
P.S. I'd like to point out that the resolution of C's DR236 is very
bad. The original issue presents an argument that the spec as written
allows the active member to be changed using simple pointers. The
committee resolution says that wasn't intended, but neither presents
an argument that the existing spec actually doesn't allow it, nor
changes the spec to express the intention. We shouldn't be citing a DR
to explain how unions work in C11; we should be able to cite C11
itself.
Received on 2013-07-25 00:11:04