C++ Logo

sg14

Advanced search

Re: Colony/hive & unordered vs ordered == operator

From: Henry Miller <hank_at_[hidden]>
Date: Tue, 15 Mar 2022 08:45:14 -0500
The more I think about this the more convinced I become that hive should not have operator== at all.

First of all: put on your "I just got my CS degree and have been doing C++ for 2 weeks" hat: intuitively I expect operator== is O(n). That it is instead O(n lg n) is an unpleasant surprise. Sure we can say it is in the documentation, but I hate documenting something unexpected without good reason. I also think SG20 (education) needs to approve such documentation as they need to teach it. C++ has enough surprises. (for completeness I'll say that only allowing == when O(n) sort applies seems obviously unacceptable)

If there is an operator == it must be ordered! Nothing in the standard guarantees that hive<int> {1,2, 3} == hive<int> {1,2, 3} if == is unordered. While that is an unlikely case, if you assume two hives with exactly the same operations performed on them, but in different translations units, then it is possible the implementation has changed between them while still being ABI compatible, and because of inlining different algorithms were used to add/remove/shrink_to_fit... I think we want to preserve the freedom of implementors to change their algorithm and thus unordered operator== cannot exist lest changing algorithms break existing code, or force on our vendors ABI break style upgrades for internal algorithm changes.

Ordered operator== is a lot more complex. Is operator== const, non-const, or both?

Const means that it is allocating memory someplace for the sort and thus has weird memory behavior. operator== that can throw an out_of_memory exception is surprising. Worse, the sorted result - which may be useful is thrown away. It seems to me that someone who wants == is probably going to do a number of them in nearby code places often without any other operations. Thus we should encourage the explicate copy of the container (possibly to a vector): the memory is going to be allocated anyway, so making it clear that a copy happens seems better to me and the only cost is one extra line of code.

non-const is weird. It invalidates iterators: I think many who were on the "whatever" side of this debate will be firmly against this one for that reason. SG20 needs to be consulted on this one as well. There are other surprise related to operator== mutating the container as well, I think they are all places where you could have used an iterator but didn't.

non-const also means you cannot compare const hives - this is another place where C++ would be surprising and bring out more blog posts about how C++ is a bad language.

Both means you have you get different behavior in different situations - and this seems bug prone. In C++98 getting a const_iterator from a non-const container was ugly and rarely done correctly - we will get similar complaints about it being hard to get the correct == from a non-const hive.

If someone makes a good argument that they want to compare hive, then it should be with functions that make the behavior clear. Sort_and_compare_hives, Compare_sorted_hives (with a contract that it is sorted), and maybe others. You can bike shed those names, but they make it clear what is happening and thus reviewers can debate if the right choice was made, while == will be something reviewers will skip over without realizing it is important (unless they have been burned).

We probably should clean up the above arguments and put them directly into the paper (an appendix) so that they are easy to find in 10 years when this comes up again.

-- 
  Henry Miller
  hank_at_[hidden]
On Mon, Mar 14, 2022, at 19:20, Matt Bentley via SG14 wrote:
> Hi Everyone,
> quick question:
> colony/hive is unordered in terms of insertion position (but sortable),
> should the ==/!= operator also be unordered ie.
> should hive<int> {1,2, 3} == hive<int> {3, 1, 2}?
>
> My main interest is not which is more 'correct' but which is more useful 
> in practice?
> I don't use == operator for containers a lot so I don't know.
> Be aware this will affect >= < etc also.
>
> The unordered version of the operator is trivial, but slower -
> it allocates pointers to each of the elements in each of the containers, 
> then sorts the pointers via the values they point to, then does a 
> standard == left-hand/right-hand compare via the pointers.
> This means (a) == allocates, (b) == is slower due to jumping around in 
> memory and not necessarily following element sequence.
>
> Opinions?
>
> BTW lewg hive discussion next week, for those who are a part of lewg 
> already.
> m@
> _______________________________________________
> SG14 mailing list
> SG14_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg14

Received on 2022-03-15 13:45:40