sg12: Re: [ub] [c++std-core-23844] Re: unions and undefined behavior

From: Marc Glisse <marc.glisse_at_[hidden]>
Date: Thu, 25 Jul 2013 00:08:56 +0200 (CEST)

On Wed, 24 Jul 2013, Howard Hinnant wrote:

> On Jul 24, 2013, at 4:55 PM, Marc Glisse <marc.glisse_at_[hidden]> wrote:
>
>> On Wed, 24 Jul 2013, Howard Hinnant wrote:
>>
>>> This is a crude model of std::map<T, U>::value_type that would allow
>>> map's copy assignment operator to recycle nodes, instead of deleting all
>>> nodes in the lhs prior to inserting new ones. Here is a an example
>>> chart of the performance of the copy assignment of a map<int, int> when
>>> the rhs has 1000 values, the lhs has between 0 and 2000 values (prior to
>>> the assignment), and the implementation is varied between a simple
>>> clear()/insert(), a structural copy (avoids rebalancing the lhs at each
>>> insert), and recycling nodes using something very similar to what is
>>> sketched above with the union V.
>>>
>>> As it stands today, even though no compiler gets this wrong, if I want
>>> this code to be completely guarded against a future optimizer exploiting
>>> the undefined behavior in V 10 years from now, I need to write the map
>>> copy assignment operator in assembly (which would be extraordinarily
>>> challenging - so much so it will never happen).
>>>
>>> The compiler writer's drive to eek out a few % optimization, and backed
>>> by this committee, is preventing me from delivering a 2X to 3X
>>> performance optimization in std::map's copy assignment operator. That
>>> means something is wrong. I hope we can fix it, and have every faith
>>> that we can with everyone's help here.
>>
>> What I am wondering is if you are trying to compensate for a bad design in
>> the library's API by asking for more core features. I may have missed many
>> points, so please be patient and don't take it as a criticism against the
>> people who designed std::map.
>>
>> If I understand correctly, if instead of pair<const K, T> we used pair<K,
>> T> as the value type in the map, your technique would be completely
>> straightforward. Now we are not doing that because we want to make it not
>> too easy for a user to modify the key. So we would like to have a
>> pair<K,T> internally but present to the user something like a pair<const
>> K, T> interface. Well, that seems hard. But if we didn't insist on using a
>> simple aggregate and instead had a regular class with accessors (use
>> first() instead of first, etc), it wouldn't be hard to present a different
>> interface to the user and to the internal functions of std::map.
>>
>> Does that make sense?
>
> Absolutely makes sense. And if we had a clean slate to work with,
> that's exactly what I would recommend.

It is probably too early, but at some point it may be a good idea to
reconsider some design decisions and provide a std::map2 (or std2::map).
Map is probably not the most relevant facility from the standard library
here, but there is a line between keeping compatibility and making the
library obsolete, and a redesign shouldn't be completely ignored, even if
it is far from a first choice.

> I'm currently looking at a benefit / cost decision, and from where I
> sit, the benefit looks large and the cost tiny:
>
> The benefit: We can make the copy assignment operator of map must
> faster for common cases. We're talking 2X, 3X, I've seen some cases go
> better than that. And there are some cases that don't go as well.
> There are no cases that are penalized.

I seldom use maps and never copy-assign them, so I am a bad candidate.

> The cost: Require that compilers do not exploit what today is labeled
> undefined behavior in the union V. Currently no compilers do, to the
> best of my knowledge. There is no change in existing compiler behavior
> that is required. What is required is that standards writers work out
> the language to ensure this constraint on future compiler optimization
> development.
>
> If we were to present this data, the benefit, and the cost, to C++
> programmers, and held a vote among that entire population, do you think
> C++ programmers would be in favor or against this engineering tradeoff?

That's the kind of question where the answer is in how the question is
formulated. If we ask them if they want to forbid compiler optimizations
just to speed-up a seldom used function in one class of the standard
library, the answer won't be the same (yes, I made it obviously biased on
purpose, I don't have enough data to form an opinion for now).

> Do you think C++ programmers might be facing their own dilemmas that
> have striking similarity to ours with std::map? I know we are. Exact
> same issue with unordered_[multi]map.

Copying the "mistakes" of std::map in unordered_map was unfortunate,
though it made sense for uniformity. But if users have the same issue,
I'll give them the same suggestion I gave here. We (well, no credit for
me, but it sounds weird to say "you") built C++ so it has powerful ways to
express all this and then ignore them to write C-like code...

Thanks for your reply and for starting this discussion on map/pair/const,
it has been bothering a lot of people for a long time...

-- 
Marc Glisse

Received on 2013-07-25 00:21:47