ISOCPP SG14 List: Re: Colony/hive & unordered vs ordered == operator

From: Matthew Bentley <mattreecebentley_at_[hidden]>
Date: Thu, 17 Mar 2022 21:00:42 +1300

Thanks Arthur, Tony & Ville - you all make good points. Thanks in
particular to Tony & Ville for arguing it out, though I think it's
probably not worth the intellectual fisticuffs at this point :)

Arthur, good to know that the algorithm for order-insensitive
comparison is a standardised algorithm that can be used with hive.

My thoughts are as follows:

* An order-sensitive comparison is the only 'cheap' option, which,
while in line with one of the main selling points of the container,
will inevitably confuse the everloving crap out of newcomers, due to
situations like:
hive<int> t = {1, 2, 3, 4, 5}, t2 = {6, 1, 2, 3, 4};
t2.erase(t2.begin());
t2.insert(5);

Whereby teh newb will assume that t1 == t2 now, when in fact the
sequence is most likely to be {5, 1, 2, 3, 4} (but this is not
guaranteed, only guarantee is that the item is inserted, the location
is an implementation decision - maybe the implementation decides to
use up unused spaces in the block before reusing erased spaces).

* An order-insensitive comparison would, I think, be wise Not to
allocate, as for the majority of SG14 fields (particularly embedded)
this is very much not a free operation, and naive use of == comparison
may lead to some significant snafu's.

* The standard approach for order-insensitive containers as Arthur
mentioned is std::is_permutation(a.begin(), a.end(), b.begin(),
b.end());
which is slow but probably the only sensible approach, should we
choose to retain ==. Best case is the sizes don't match, then it's an
O(1) operation, worst case is O(n pow 2), which is mind-blowing to me,
but probably non-problematic for small data sets, and in reality only
sets which differ in their final element fulfill that complexity
guarantee.

Given all of the above, my leaning is toward either (a) retaining an
order-insensitive == operator based on the standard approach, and
educating users as to this being a non-trivial operation which should
only be used sparingly, or (b) removing ==.

I would suggest trying:
(a) the obvious size comparison (O(1)), followed by
(b) Ville's suggestion of trying the order-sensitive comparison first
(O(n)), then only if that fails,
(c) std::is_permutation(a.begin(), a.end(), b.begin(), b.end()); (O(n pow 2))

Statistical benchmarking with/without the order-sensitive comparison
across a large number of data sets should show whether that's worth
including, but that's an implementator's prerogative.

I would also strongly support taking my approach (pointer-indirected
order-insensitive comparison which allows for allocation) and putting
that into a separate std:: algorithm - this can be used with any
container, not just hive. std::is_permutation_allocation_allowed or
something more terse. To answer Pedro Oliveira's question (he replied
to the digest post by mistake), I don't have any statistics on what
the performance is like vs order-insensitive comparison, but if anyone
wants to benchmark it the version of hive with my allocating unordered
== function's attached.

A few additional queries, if case anyone has an opinion on them:

* The sort() function of hive (which, in my implementation, uses a
similar mechanism to the pointer indirection above) is currently
specified as being allowed to allocate. Is this permissible to SG14?

* I think it should probably be specified that copying a container
Cannot change the iteration order. There is no legitimate performance
reason not to specify this (as far as I know), as iterative order is
also memory locality order, hence iterative order is also the fastest
way to copy. This also makes the copying of sort()'ed hives less
problematic. If anyone has any objections to this let me know.

Thanks all once again-
M@

On 17/03/2022, Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]> wrote:
> A few notes:
> - Some people in this thread have used the adjectives "ordered" and
> "unordered" to talk about operator=='s behavior. I believe those terms are
> ambiguous. I used "order-sensitive" and "order-insensitive" to try to deal
> with that ambiguity.
> - Every STL container has a (possibly non-deterministic) *iteration order*,

> i.e., the order in which the elements are traversed from .begin() to
> .end(). Your only job is to decide whether that iteration order is salient
> or not. (For a vector or list, the iteration order is obviously salient.
> For a std::unordered_set, the iteration order is obviously non-salient. For
> a std::set, the iteration order is fixed so conceptually it doesn't matter
> whether it's salient or not, at least for non-pathological element types.)
> - "Order-sensitive equality comparison" is always implementable as
> bool operator==(const C& a, const C& b) { return std::equal(a.begin(),
> a.end(), b.begin(), b.end()); }
> Sequence and associative containers mandate exactly this implementation:
> https://eel.is/c++draft/container.reqmts#lib:operator==,containers
> - "Order-insensitive equality comparison" is always implementable as
> bool operator==(const C& a, const C& b) { return
> std::is_permutation(a.begin(), a.end(), b.begin(), b.end()); }
> Unordered containers mandate a slight performance optimization of this
> implementation: https://eel.is/c++draft/unord.req#general-237
> - If copying a container can change the iteration order, then iteration
> order must not be salient, by definition (because, by definition, a copy
> copies all the salient properties of the original). In other words, if
> `auto b = a; assert(std::equal(a.begin(), a.end(), b.begin(), b.end()));`
> can fail in practice, then iteration order must not be salient, by
> definition.
>
>
> On Wed, Mar 16, 2022 at 3:02 AM Matthew Bentley via SG14 <
> sg14_at_[hidden]> wrote:
>
>> On Wed, 16 Mar 2022 at 12:24, Nevin Liber <nevin_at_[hidden]> wrote:
>>
>>>
>>> My opinion is that it should have one, and if it does, the expectation
>>> is
>>> that it is unordered.
>>>
>>> For most containers (other than string), I rarely use its comparison
>>> operators, but when I need them, they are very handy.
>>>
>>> We are supposed to be designing a coherent standard library, and all the
>>> containers in the standard library are equality comparable. This should
>>> not be a snowflake.
>>>
>>> How does the complexity compare with the complexity of
>>> unordered_multiset::operator==?
>>>
>>
>> My apologies to Henry Miller from earlier, time complexity would be
>> approx
>> O(n log n), as he states - I forgot the time complexity of the sort
>> function itself. At any rate, 1 x O(n) (gather pointers), 1 x O(n log n)
>> (sort pointers), 1 x O(n) (compare elements via pointers).
>> There may be better ways of doing what I'm talking about, this is only
>> what I've managed to come up with thus far.
>> Whereas unordered_multiset::operator==is O(n pow 2) in worst case.
>>
>> _______________________________________________
>> SG14 mailing list
>> SG14_at_[hidden]
>> https://lists.isocpp.org/mailman/listinfo.cgi/sg14
>>
>

Received on 2022-03-17 08:01:18