sg12: Re: [ub] [c++std-core-23844] Re: unions and undefined behavior

From: Howard Hinnant <howard.hinnant_at_[hidden]>
Date: Wed, 24 Jul 2013 17:29:40 -0400

On Jul 24, 2013, at 4:55 PM, Marc Glisse <marc.glisse_at_[hidden]> wrote:

> On Wed, 24 Jul 2013, Howard Hinnant wrote:
>
>> This is a crude model of std::map<T, U>::value_type that would allow
>> map's copy assignment operator to recycle nodes, instead of deleting all
>> nodes in the lhs prior to inserting new ones. Here is a an example
>> chart of the performance of the copy assignment of a map<int, int> when
>> the rhs has 1000 values, the lhs has between 0 and 2000 values (prior to
>> the assignment), and the implementation is varied between a simple
>> clear()/insert(), a structural copy (avoids rebalancing the lhs at each
>> insert), and recycling nodes using something very similar to what is
>> sketched above with the union V.
>>
>> As it stands today, even though no compiler gets this wrong, if I want
>> this code to be completely guarded against a future optimizer exploiting
>> the undefined behavior in V 10 years from now, I need to write the map
>> copy assignment operator in assembly (which would be extraordinarily
>> challenging - so much so it will never happen).
>>
>> The compiler writer's drive to eek out a few % optimization, and backed
>> by this committee, is preventing me from delivering a 2X to 3X
>> performance optimization in std::map's copy assignment operator. That
>> means something is wrong. I hope we can fix it, and have every faith
>> that we can with everyone's help here.
>
> What I am wondering is if you are trying to compensate for a bad design in
> the library's API by asking for more core features. I may have missed many
> points, so please be patient and don't take it as a criticism against the
> people who designed std::map.
>
> If I understand correctly, if instead of pair<const K, T> we used pair<K,
> T> as the value type in the map, your technique would be completely
> straightforward. Now we are not doing that because we want to make it not
> too easy for a user to modify the key. So we would like to have a
> pair<K,T> internally but present to the user something like a pair<const
> K, T> interface. Well, that seems hard. But if we didn't insist on using a
> simple aggregate and instead had a regular class with accessors (use
> first() instead of first, etc), it wouldn't be hard to present a different
> interface to the user and to the internal functions of std::map.
>
> Does that make sense?

Absolutely makes sense. And if we had a clean slate to work with, that's exactly what I would recommend.

I'm currently looking at a benefit / cost decision, and from where I sit, the benefit looks large and the cost tiny:

The benefit: We can make the copy assignment operator of map must faster for common cases. We're talking 2X, 3X, I've seen some cases go better than that. And there are some cases that don't go as well. There are no cases that are penalized.

The cost: Require that compilers do not exploit what today is labeled undefined behavior in the union V. Currently no compilers do, to the best of my knowledge. There is no change in existing compiler behavior that is required. What is required is that standards writers work out the language to ensure this constraint on future compiler optimization development.

If we were to present this data, the benefit, and the cost, to C++ programmers, and held a vote among that entire population, do you think C++ programmers would be in favor or against this engineering tradeoff?

Do you think C++ programmers might be facing their own dilemmas that have striking similarity to ours with std::map? I know we are. Exact same issue with unordered_[multi]map.

Howard

Received on 2013-07-24 23:29:34