Re: Colony/hive & unordered vs ordered == operator

From: Matthew Bentley <mattreecebentley_at_[hidden]>
Date: Thu, 17 Mar 2022 21:03:57 +1300
... And I forgot the attachment. attached.
M

On Thu, 17 Mar 2022 at 21:00, Matthew Bentley <mattreecebentley_at_[hidden]>
wrote:

> Thanks Arthur, Tony & Ville - you all make good points. Thanks in
> particular to Tony & Ville for arguing it out, though I think it's
> probably not worth the intellectual fisticuffs at this point :)
>
> Arthur, good to know that the algorithm for order-insensitive
> comparison is a standardised algorithm that can be used with hive.
>
> My thoughts are as follows:
>
> * An order-sensitive comparison is the only 'cheap' option, which,
> while in line with one of the main selling points of the container,
> will inevitably confuse the everloving crap out of newcomers, due to
> situations like:
> hive<int> t = {1, 2, 3, 4, 5}, t2 = {6, 1, 2, 3, 4};
> t2.erase(t2.begin());
> t2.insert(5);
>
> Whereby teh newb will assume that t1 == t2 now, when in fact the
> sequence is most likely to be {5, 1, 2, 3, 4} (but this is not
> guaranteed, only guarantee is that the item is inserted, the location
> is an implementation decision - maybe the implementation decides to
> use up unused spaces in the block before reusing erased spaces).
>
> * An order-insensitive comparison would, I think, be wise Not to
> allocate, as for the majority of SG14 fields (particularly embedded)
> this is very much not a free operation, and naive use of == comparison
> may lead to some significant snafu's.
>
> * The standard approach for order-insensitive containers as Arthur
> mentioned is std::is_permutation(a.begin(), a.end(), b.begin(),
> b.end());
> which is slow but probably the only sensible approach, should we
> choose to retain ==. Best case is the sizes don't match, then it's an
> O(1) operation, worst case is O(n pow 2), which is mind-blowing to me,
> but probably non-problematic for small data sets, and in reality only
> sets which differ in their final element fulfill that complexity
> guarantee.
>
> Given all of the above, my leaning is toward either (a) retaining an
> order-insensitive == operator based on the standard approach, and
> educating users as to this being a non-trivial operation which should
> only be used sparingly, or (b) removing ==.
>
> I would suggest trying:
> (a) the obvious size comparison (O(1)), followed by
> (b) Ville's suggestion of trying the order-sensitive comparison first
> (O(n)), then only if that fails,
> (c) std::is_permutation(a.begin(), a.end(), b.begin(), b.end()); (O(n pow
> 2))
>
> Statistical benchmarking with/without the order-sensitive comparison
> across a large number of data sets should show whether that's worth
> including, but that's an implementator's prerogative.
>
> I would also strongly support taking my approach (pointer-indirected
> order-insensitive comparison which allows for allocation) and putting
> that into a separate std:: algorithm - this can be used with any
> container, not just hive. std::is_permutation_allocation_allowed or
> something more terse. To answer Pedro Oliveira's question (he replied
> to the digest post by mistake), I don't have any statistics on what
> the performance is like vs order-insensitive comparison, but if anyone
> wants to benchmark it the version of hive with my allocating unordered
> == function's attached.
>
> A few additional queries, if case anyone has an opinion on them:
>
> * The sort() function of hive (which, in my implementation, uses a
> similar mechanism to the pointer indirection above) is currently
> specified as being allowed to allocate. Is this permissible to SG14?
>
> * I think it should probably be specified that copying a container
> Cannot change the iteration order. There is no legitimate performance
> reason not to specify this (as far as I know), as iterative order is
> also memory locality order, hence iterative order is also the fastest
> way to copy. This also makes the copying of sort()'ed hives less
> problematic. If anyone has any objections to this let me know.
>
> Thanks all once again-
> M@
>
> On 17/03/2022, Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]> wrote:
> > A few notes:
> > - Some people in this thread have used the adjectives "ordered" and
> > "unordered" to talk about operator=='s behavior. I believe those terms
> are
> > ambiguous. I used "order-sensitive" and "order-insensitive" to try to
> deal
> > with that ambiguity.
> > - Every STL container has a (possibly non-deterministic) *iteration
> order*,
>
>
> > i.e., the order in which the elements are traversed from .begin() to
> > .end(). Your only job is to decide whether that iteration order is
> salient
> > or not. (For a vector or list, the iteration order is obviously salient.
> > For a std::unordered_set, the iteration order is obviously non-salient.
> For
> > a std::set, the iteration order is fixed so conceptually it doesn't
> matter
> > whether it's salient or not, at least for non-pathological element
> types.)
> > - "Order-sensitive equality comparison" is always implementable as
> > bool operator==(const C& a, const C& b) { return
> std::equal(a.begin(),
> > a.end(), b.begin(), b.end()); }
> > Sequence and associative containers mandate exactly this implementation:

> > - "Order-insensitive equality comparison" is always implementable as
> > bool operator==(const C& a, const C& b) { return
> > std::is_permutation(a.begin(), a.end(), b.begin(), b.end()); }
> > Unordered containers mandate a slight performance optimization of this
> > implementation: https://eel.is/c++draft/unord.req#general-237
> > - If copying a container can change the iteration order, then iteration
> > order must not be salient, by definition (because, by definition, a copy
> > copies all the salient properties of the original). In other words, if
> > `auto b = a; assert(std::equal(a.begin(), a.end(), b.begin(), b.end()));`
> > can fail in practice, then iteration order must not be salient, by
> > definition.
> >
> >
> > On Wed, Mar 16, 2022 at 3:02 AM Matthew Bentley via SG14 <
> > sg14_at_[hidden]> wrote:
> >
> >> On Wed, 16 Mar 2022 at 12:24, Nevin Liber <nevin_at_[hidden]>
> wrote:
> >>
> >>>
> >>> My opinion is that it should have one, and if it does, the expectation
> >>> is
> >>> that it is unordered.
> >>>
> >>> For most containers (other than string), I rarely use its comparison
> >>> operators, but when I need them, they are very handy.
> >>>
> >>> We are supposed to be designing a coherent standard library, and all
> the
> >>> containers in the standard library are equality comparable. This
> should
> >>> not be a snowflake.
> >>>
> >>> How does the complexity compare with the complexity of
> >>> unordered_multiset::operator==?
> >>>
> >>
> >> My apologies to Henry Miller from earlier, time complexity would be
> >> approx
> >> O(n log n), as he states - I forgot the time complexity of the sort
> >> function itself. At any rate, 1 x O(n) (gather pointers), 1 x O(n log n)
> >> (sort pointers), 1 x O(n) (compare elements via pointers).
> >> There may be better ways of doing what I'm talking about, this is only
> >> what I've managed to come up with thus far.
> >> Whereas unordered_multiset::operator==is O(n pow 2) in worst case.
> >>
> >> _______________________________________________
> >> SG14 mailing list
> >> SG14_at_[hidden]

> >>
> >
>