Date: Thu, 17 Mar 2022 21:03:57 +1300

... And I forgot the attachment. attached.

M

On Thu, 17 Mar 2022 at 21:00, Matthew Bentley <mattreecebentley_at_[hidden]>

wrote:

> Thanks Arthur, Tony & Ville - you all make good points. Thanks in

> particular to Tony & Ville for arguing it out, though I think it's

> probably not worth the intellectual fisticuffs at this point :)

>

> Arthur, good to know that the algorithm for order-insensitive

> comparison is a standardised algorithm that can be used with hive.

>

> My thoughts are as follows:

>

> * An order-sensitive comparison is the only 'cheap' option, which,

> while in line with one of the main selling points of the container,

> will inevitably confuse the everloving crap out of newcomers, due to

> situations like:

> hive<int> t = {1, 2, 3, 4, 5}, t2 = {6, 1, 2, 3, 4};

> t2.erase(t2.begin());

> t2.insert(5);

>

> Whereby teh newb will assume that t1 == t2 now, when in fact the

> sequence is most likely to be {5, 1, 2, 3, 4} (but this is not

> guaranteed, only guarantee is that the item is inserted, the location

> is an implementation decision - maybe the implementation decides to

> use up unused spaces in the block before reusing erased spaces).

>

> * An order-insensitive comparison would, I think, be wise Not to

> allocate, as for the majority of SG14 fields (particularly embedded)

> this is very much not a free operation, and naive use of == comparison

> may lead to some significant snafu's.

>

> * The standard approach for order-insensitive containers as Arthur

> mentioned is std::is_permutation(a.begin(), a.end(), b.begin(),

> b.end());

> which is slow but probably the only sensible approach, should we

> choose to retain ==. Best case is the sizes don't match, then it's an

> O(1) operation, worst case is O(n pow 2), which is mind-blowing to me,

> but probably non-problematic for small data sets, and in reality only

> sets which differ in their final element fulfill that complexity

> guarantee.

>

> Given all of the above, my leaning is toward either (a) retaining an

> order-insensitive == operator based on the standard approach, and

> educating users as to this being a non-trivial operation which should

> only be used sparingly, or (b) removing ==.

>

> I would suggest trying:

> (a) the obvious size comparison (O(1)), followed by

> (b) Ville's suggestion of trying the order-sensitive comparison first

> (O(n)), then only if that fails,

> (c) std::is_permutation(a.begin(), a.end(), b.begin(), b.end()); (O(n pow

> 2))

>

> Statistical benchmarking with/without the order-sensitive comparison

> across a large number of data sets should show whether that's worth

> including, but that's an implementator's prerogative.

>

> I would also strongly support taking my approach (pointer-indirected

> order-insensitive comparison which allows for allocation) and putting

> that into a separate std:: algorithm - this can be used with any

> container, not just hive. std::is_permutation_allocation_allowed or

> something more terse. To answer Pedro Oliveira's question (he replied

> to the digest post by mistake), I don't have any statistics on what

> the performance is like vs order-insensitive comparison, but if anyone

> wants to benchmark it the version of hive with my allocating unordered

> == function's attached.

>

> A few additional queries, if case anyone has an opinion on them:

>

> * The sort() function of hive (which, in my implementation, uses a

> similar mechanism to the pointer indirection above) is currently

> specified as being allowed to allocate. Is this permissible to SG14?

>

> * I think it should probably be specified that copying a container

> Cannot change the iteration order. There is no legitimate performance

> reason not to specify this (as far as I know), as iterative order is

> also memory locality order, hence iterative order is also the fastest

> way to copy. This also makes the copying of sort()'ed hives less

> problematic. If anyone has any objections to this let me know.

>

> Thanks all once again-

> M@

>

> On 17/03/2022, Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]> wrote:

> > A few notes:

> > - Some people in this thread have used the adjectives "ordered" and

> > "unordered" to talk about operator=='s behavior. I believe those terms

> are

> > ambiguous. I used "order-sensitive" and "order-insensitive" to try to

> deal

> > with that ambiguity.

> > - Every STL container has a (possibly non-deterministic) *iteration

> order*,

>

>

> > i.e., the order in which the elements are traversed from .begin() to

> > .end(). Your only job is to decide whether that iteration order is

> salient

> > or not. (For a vector or list, the iteration order is obviously salient.

> > For a std::unordered_set, the iteration order is obviously non-salient.

> For

> > a std::set, the iteration order is fixed so conceptually it doesn't

> matter

> > whether it's salient or not, at least for non-pathological element

> types.)

> > - "Order-sensitive equality comparison" is always implementable as

> > bool operator==(const C& a, const C& b) { return

> std::equal(a.begin(),

> > a.end(), b.begin(), b.end()); }

> > Sequence and associative containers mandate exactly this implementation:

> > https://eel.is/c++draft/container.reqmts#lib:operator==,containers

> > - "Order-insensitive equality comparison" is always implementable as

> > bool operator==(const C& a, const C& b) { return

> > std::is_permutation(a.begin(), a.end(), b.begin(), b.end()); }

> > Unordered containers mandate a slight performance optimization of this

> > implementation: https://eel.is/c++draft/unord.req#general-237

> > - If copying a container can change the iteration order, then iteration

> > order must not be salient, by definition (because, by definition, a copy

> > copies all the salient properties of the original). In other words, if

> > `auto b = a; assert(std::equal(a.begin(), a.end(), b.begin(), b.end()));`

> > can fail in practice, then iteration order must not be salient, by

> > definition.

> >

> >

> > On Wed, Mar 16, 2022 at 3:02 AM Matthew Bentley via SG14 <

> > sg14_at_[hidden]> wrote:

> >

> >> On Wed, 16 Mar 2022 at 12:24, Nevin Liber <nevin_at_[hidden]>

> wrote:

> >>

> >>>

> >>> My opinion is that it should have one, and if it does, the expectation

> >>> is

> >>> that it is unordered.

> >>>

> >>> For most containers (other than string), I rarely use its comparison

> >>> operators, but when I need them, they are very handy.

> >>>

> >>> We are supposed to be designing a coherent standard library, and all

> the

> >>> containers in the standard library are equality comparable. This

> should

> >>> not be a snowflake.

> >>>

> >>> How does the complexity compare with the complexity of

> >>> unordered_multiset::operator==?

> >>>

> >>

> >> My apologies to Henry Miller from earlier, time complexity would be

> >> approx

> >> O(n log n), as he states - I forgot the time complexity of the sort

> >> function itself. At any rate, 1 x O(n) (gather pointers), 1 x O(n log n)

> >> (sort pointers), 1 x O(n) (compare elements via pointers).

> >> There may be better ways of doing what I'm talking about, this is only

> >> what I've managed to come up with thus far.

> >> Whereas unordered_multiset::operator==is O(n pow 2) in worst case.

> >>

> >> _______________________________________________

> >> SG14 mailing list

> >> SG14_at_[hidden]

> >> https://lists.isocpp.org/mailman/listinfo.cgi/sg14

> >>

> >

>

M

On Thu, 17 Mar 2022 at 21:00, Matthew Bentley <mattreecebentley_at_[hidden]>

wrote:

> Thanks Arthur, Tony & Ville - you all make good points. Thanks in

> particular to Tony & Ville for arguing it out, though I think it's

> probably not worth the intellectual fisticuffs at this point :)

>

> Arthur, good to know that the algorithm for order-insensitive

> comparison is a standardised algorithm that can be used with hive.

>

> My thoughts are as follows:

>

> * An order-sensitive comparison is the only 'cheap' option, which,

> while in line with one of the main selling points of the container,

> will inevitably confuse the everloving crap out of newcomers, due to

> situations like:

> hive<int> t = {1, 2, 3, 4, 5}, t2 = {6, 1, 2, 3, 4};

> t2.erase(t2.begin());

> t2.insert(5);

>

> Whereby teh newb will assume that t1 == t2 now, when in fact the

> sequence is most likely to be {5, 1, 2, 3, 4} (but this is not

> guaranteed, only guarantee is that the item is inserted, the location

> is an implementation decision - maybe the implementation decides to

> use up unused spaces in the block before reusing erased spaces).

>

> * An order-insensitive comparison would, I think, be wise Not to

> allocate, as for the majority of SG14 fields (particularly embedded)

> this is very much not a free operation, and naive use of == comparison

> may lead to some significant snafu's.

>

> * The standard approach for order-insensitive containers as Arthur

> mentioned is std::is_permutation(a.begin(), a.end(), b.begin(),

> b.end());

> which is slow but probably the only sensible approach, should we

> choose to retain ==. Best case is the sizes don't match, then it's an

> O(1) operation, worst case is O(n pow 2), which is mind-blowing to me,

> but probably non-problematic for small data sets, and in reality only

> sets which differ in their final element fulfill that complexity

> guarantee.

>

> Given all of the above, my leaning is toward either (a) retaining an

> order-insensitive == operator based on the standard approach, and

> educating users as to this being a non-trivial operation which should

> only be used sparingly, or (b) removing ==.

>

> I would suggest trying:

> (a) the obvious size comparison (O(1)), followed by

> (b) Ville's suggestion of trying the order-sensitive comparison first

> (O(n)), then only if that fails,

> (c) std::is_permutation(a.begin(), a.end(), b.begin(), b.end()); (O(n pow

> 2))

>

> Statistical benchmarking with/without the order-sensitive comparison

> across a large number of data sets should show whether that's worth

> including, but that's an implementator's prerogative.

>

> I would also strongly support taking my approach (pointer-indirected

> order-insensitive comparison which allows for allocation) and putting

> that into a separate std:: algorithm - this can be used with any

> container, not just hive. std::is_permutation_allocation_allowed or

> something more terse. To answer Pedro Oliveira's question (he replied

> to the digest post by mistake), I don't have any statistics on what

> the performance is like vs order-insensitive comparison, but if anyone

> wants to benchmark it the version of hive with my allocating unordered

> == function's attached.

>

> A few additional queries, if case anyone has an opinion on them:

>

> * The sort() function of hive (which, in my implementation, uses a

> similar mechanism to the pointer indirection above) is currently

> specified as being allowed to allocate. Is this permissible to SG14?

>

> * I think it should probably be specified that copying a container

> Cannot change the iteration order. There is no legitimate performance

> reason not to specify this (as far as I know), as iterative order is

> also memory locality order, hence iterative order is also the fastest

> way to copy. This also makes the copying of sort()'ed hives less

> problematic. If anyone has any objections to this let me know.

>

> Thanks all once again-

> M@

>

> On 17/03/2022, Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]> wrote:

> > A few notes:

> > - Some people in this thread have used the adjectives "ordered" and

> > "unordered" to talk about operator=='s behavior. I believe those terms

> are

> > ambiguous. I used "order-sensitive" and "order-insensitive" to try to

> deal

> > with that ambiguity.

> > - Every STL container has a (possibly non-deterministic) *iteration

> order*,

>

>

> > i.e., the order in which the elements are traversed from .begin() to

> > .end(). Your only job is to decide whether that iteration order is

> salient

> > or not. (For a vector or list, the iteration order is obviously salient.

> > For a std::unordered_set, the iteration order is obviously non-salient.

> For

> > a std::set, the iteration order is fixed so conceptually it doesn't

> matter

> > whether it's salient or not, at least for non-pathological element

> types.)

> > - "Order-sensitive equality comparison" is always implementable as

> > bool operator==(const C& a, const C& b) { return

> std::equal(a.begin(),

> > a.end(), b.begin(), b.end()); }

> > Sequence and associative containers mandate exactly this implementation:

> > https://eel.is/c++draft/container.reqmts#lib:operator==,containers

> > - "Order-insensitive equality comparison" is always implementable as

> > bool operator==(const C& a, const C& b) { return

> > std::is_permutation(a.begin(), a.end(), b.begin(), b.end()); }

> > Unordered containers mandate a slight performance optimization of this

> > implementation: https://eel.is/c++draft/unord.req#general-237

> > - If copying a container can change the iteration order, then iteration

> > order must not be salient, by definition (because, by definition, a copy

> > copies all the salient properties of the original). In other words, if

> > `auto b = a; assert(std::equal(a.begin(), a.end(), b.begin(), b.end()));`

> > can fail in practice, then iteration order must not be salient, by

> > definition.

> >

> >

> > On Wed, Mar 16, 2022 at 3:02 AM Matthew Bentley via SG14 <

> > sg14_at_[hidden]> wrote:

> >

> >> On Wed, 16 Mar 2022 at 12:24, Nevin Liber <nevin_at_[hidden]>

> wrote:

> >>

> >>>

> >>> My opinion is that it should have one, and if it does, the expectation

> >>> is

> >>> that it is unordered.

> >>>

> >>> For most containers (other than string), I rarely use its comparison

> >>> operators, but when I need them, they are very handy.

> >>>

> >>> We are supposed to be designing a coherent standard library, and all

> the

> >>> containers in the standard library are equality comparable. This

> should

> >>> not be a snowflake.

> >>>

> >>> How does the complexity compare with the complexity of

> >>> unordered_multiset::operator==?

> >>>

> >>

> >> My apologies to Henry Miller from earlier, time complexity would be

> >> approx

> >> O(n log n), as he states - I forgot the time complexity of the sort

> >> function itself. At any rate, 1 x O(n) (gather pointers), 1 x O(n log n)

> >> (sort pointers), 1 x O(n) (compare elements via pointers).

> >> There may be better ways of doing what I'm talking about, this is only

> >> what I've managed to come up with thus far.

> >> Whereas unordered_multiset::operator==is O(n pow 2) in worst case.

> >>

> >> _______________________________________________

> >> SG14 mailing list

> >> SG14_at_[hidden]

> >> https://lists.isocpp.org/mailman/listinfo.cgi/sg14

> >>

> >

>

Received on 2022-03-17 08:04:33