sg12: Re: [ub] [c++std-core-23844] Re: unions and undefined behavior

From: Gabriel Dos Reis <gdr_at_[hidden]>
Date: Wed, 24 Jul 2013 11:29:31 -0500

Howard --

SG12 has been created and specifically tasked for this set of problems.
See

http://www.open-std.org/pipermail/ub/2013-May/000005.html

I would like to move to discussion there.

I do have one comment; see below.

Howard Hinnant <howard.hinnant_at_[hidden]> writes:

| On Jul 23, 2013, at 10:36 PM, Lawrence Crowl <Lawrence_at_[hidden]> wrote:
|
| >> This isn't an academic argument. If we can keep the compiler from
| >> optimizing away the following example code (which has UB from what I've been
| >> reading here), we make some parts of the std::library components 2 to 3
| >> times faster. That's a much better optimization than what the compilers are
| >> getting from actively exploiting UB.
| >>
| >> #include <utility>
| >> #include <string>
| >>
| >> template <class T, class U>
| >> struct pair
| >> {
| >> T first;
| >> U second;
| >> };
| >>
| >> template <class T, class U>
| >> union V
| >> {
| >> typedef pair< T, U> NPair;
| >> typedef pair<const T, U> CPair;
| >> NPair n;
| >> CPair c;
| >>
| >> V() : c() {}
| >> V(const T& t) : c{t} {}
| >> V(const T& t, const U& u) : c{t, u} {}
| >> V(const V& v) : c(v.c) {}
| >> V& operator=(const V& v) {n = v.n; return *this;}
| >> V(V&& v) : n(std::move(v.n)) {}
| >> V& operator=(V&& v) {n = std::move(v.n); return *this;}
| >> ~V() {c.~CPair();}
| >> };
| >>
| >> struct X
| >> {
| >> std::string s;
| >> };
| >>
| >> struct Y
| >> {
| >> std::string s;
| >> };
| >>
| >> int
| >> main()
| >> {
| >> typedef V<X, Y> Z;
| >> Z z1(X{"some "}, Y{"text"});
| >> Z z2 = std::move(z1);
| >> }
| >
| > Can you explain your reasoning here?
| >
| > BTW, I think there is potential optimization in knowing that a value
| > will not change, and so writing into a const variable strikes me as
| > being locally optimistic and globally pessimistic.
|
| This is a crude model of std::map<T, U>::value_type that would allow
| map's copy assignment operator to recycle nodes, instead of deleting
| all nodes in the lhs prior to inserting new ones. Here is a an
| example chart of the performance of the copy assignment of a map<int,
| int> when the rhs has 1000 values, the lhs has between 0 and 2000
| values (prior to the assignment), and the implementation is varied
| between a simple clear()/insert(), a structural copy (avoids
| rebalancing the lhs at each insert), and recycling nodes using
| something very similar to what is sketched above with the union V.
|
| As it stands today, even though no compiler gets this wrong, if I want
| this code to be completely guarded against a future optimizer
| exploiting the undefined behavior in V 10 years from now, I need to
| write the map copy assignment operator in assembly (which would be
| extraordinarily challenging - so much so it will never happen).
|
| The compiler writer's drive to eek out a few % optimization, and
| backed by this committee, is preventing me from delivering a 2X to 3X
| performance optimization in std::map's copy assignment operator. That
| means something is wrong. I hope we can fix it, and have every faith
| that we can with everyone's help here.
|
| The challenge before this committee is *not* to define undefined
| behavior. It is to, in some reasonable way, forbid compilers from
| drastically changing the meaning of code between unoptimized and
| optimized builds by allowing optimizers to silently reason that "all
| bets are off" when it detects behavior that this committee has labeled
| as undefined.
|
| Howard

I strongly disagree with attempts to characterize compiler writers or
optimizer writers as evil warlords out there to get C++ programmers, and
the C++ committee being their obedient servants or purveyors :-)

I suspect the fundamental problem is our collective inability to get the
standards say precisely what mean and to mean what we say, and foremost
to remain true to logical coherence.

When the optimizers were just conducting local business, it didn't
matter much that the left hand didn't know what the right hand was
doing; and sloppy logic (in the standards, that is) wasn't that harmful.

However, those days are long gone -- gone before we even had the first
C++ standards. Optimizers are capable of merging facts across statement
boundaries, function boundaries, and translation boundaries -- and
as C++ programmers, we do actually love that (see C++ template model).
This is only going to continue. The celebrated One Definition Rule was
an early recognition that global logical coherence is fundamental.

If we want C++ to remain competitive, I believe the route isn't to tell
programming tools to stop propagating facts they can gather from
static analysis or logical inference. We have to reduce the reach of
unrestricted behavior, but that would start with reducing violence to
logic; we have to embrace the fact that types are there to give meaning
to programs and composition (e.g. abstraction tools, functions,
templates, etc.) is our friend. Each time we deviate from composition
is an opportunity to trip over ourselves. Again, I will point out that
the "visible alias rule" as C99 attempted is a good case study.

If we want type-based alias analysis -- which is an established
tradition -- then we have to take aliasing rules very seriously.
We have to find better (and more uniform) rules for when it is OK to
access an object value. We should also seriously consider standard
library routines as alternatives, if crafting core language rules would
open unacceptable logical holes. Not every program one can write in C++
has to be reduced to elemental core language (clauses 1 through 16 only.)

-- Gaby

Received on 2013-07-24 18:50:06