Date: Thu, 17 Mar 2022 21:00:42 +1300

Thanks Arthur, Tony & Ville - you all make good points. Thanks in

particular to Tony & Ville for arguing it out, though I think it's

probably not worth the intellectual fisticuffs at this point :)

Arthur, good to know that the algorithm for order-insensitive

comparison is a standardised algorithm that can be used with hive.

My thoughts are as follows:

* An order-sensitive comparison is the only 'cheap' option, which,

while in line with one of the main selling points of the container,

will inevitably confuse the everloving crap out of newcomers, due to

situations like:

hive<int> t = {1, 2, 3, 4, 5}, t2 = {6, 1, 2, 3, 4};

t2.erase(t2.begin());

t2.insert(5);

Whereby teh newb will assume that t1 == t2 now, when in fact the

sequence is most likely to be {5, 1, 2, 3, 4} (but this is not

guaranteed, only guarantee is that the item is inserted, the location

is an implementation decision - maybe the implementation decides to

use up unused spaces in the block before reusing erased spaces).

* An order-insensitive comparison would, I think, be wise Not to

allocate, as for the majority of SG14 fields (particularly embedded)

this is very much not a free operation, and naive use of == comparison

may lead to some significant snafu's.

* The standard approach for order-insensitive containers as Arthur

mentioned is std::is_permutation(a.begin(), a.end(), b.begin(),

b.end());

which is slow but probably the only sensible approach, should we

choose to retain ==. Best case is the sizes don't match, then it's an

O(1) operation, worst case is O(n pow 2), which is mind-blowing to me,

but probably non-problematic for small data sets, and in reality only

sets which differ in their final element fulfill that complexity

guarantee.

Given all of the above, my leaning is toward either (a) retaining an

order-insensitive == operator based on the standard approach, and

educating users as to this being a non-trivial operation which should

only be used sparingly, or (b) removing ==.

I would suggest trying:

(a) the obvious size comparison (O(1)), followed by

(b) Ville's suggestion of trying the order-sensitive comparison first

(O(n)), then only if that fails,

(c) std::is_permutation(a.begin(), a.end(), b.begin(), b.end()); (O(n pow 2))

Statistical benchmarking with/without the order-sensitive comparison

across a large number of data sets should show whether that's worth

including, but that's an implementator's prerogative.

I would also strongly support taking my approach (pointer-indirected

order-insensitive comparison which allows for allocation) and putting

that into a separate std:: algorithm - this can be used with any

container, not just hive. std::is_permutation_allocation_allowed or

something more terse. To answer Pedro Oliveira's question (he replied

to the digest post by mistake), I don't have any statistics on what

the performance is like vs order-insensitive comparison, but if anyone

wants to benchmark it the version of hive with my allocating unordered

== function's attached.

A few additional queries, if case anyone has an opinion on them:

* The sort() function of hive (which, in my implementation, uses a

similar mechanism to the pointer indirection above) is currently

specified as being allowed to allocate. Is this permissible to SG14?

* I think it should probably be specified that copying a container

Cannot change the iteration order. There is no legitimate performance

reason not to specify this (as far as I know), as iterative order is

also memory locality order, hence iterative order is also the fastest

way to copy. This also makes the copying of sort()'ed hives less

problematic. If anyone has any objections to this let me know.

Thanks all once again-

M@

On 17/03/2022, Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]> wrote:

> A few notes:

> - Some people in this thread have used the adjectives "ordered" and

> "unordered" to talk about operator=='s behavior. I believe those terms are

> ambiguous. I used "order-sensitive" and "order-insensitive" to try to deal

> with that ambiguity.

> - Every STL container has a (possibly non-deterministic) *iteration order*,

> i.e., the order in which the elements are traversed from .begin() to

> .end(). Your only job is to decide whether that iteration order is salient

> or not. (For a vector or list, the iteration order is obviously salient.

> For a std::unordered_set, the iteration order is obviously non-salient. For

> a std::set, the iteration order is fixed so conceptually it doesn't matter

> whether it's salient or not, at least for non-pathological element types.)

> - "Order-sensitive equality comparison" is always implementable as

> bool operator==(const C& a, const C& b) { return std::equal(a.begin(),

> a.end(), b.begin(), b.end()); }

> Sequence and associative containers mandate exactly this implementation:

> https://eel.is/c++draft/container.reqmts#lib:operator==,containers

> - "Order-insensitive equality comparison" is always implementable as

> bool operator==(const C& a, const C& b) { return

> std::is_permutation(a.begin(), a.end(), b.begin(), b.end()); }

> Unordered containers mandate a slight performance optimization of this

> implementation: https://eel.is/c++draft/unord.req#general-237

> - If copying a container can change the iteration order, then iteration

> order must not be salient, by definition (because, by definition, a copy

> copies all the salient properties of the original). In other words, if

> `auto b = a; assert(std::equal(a.begin(), a.end(), b.begin(), b.end()));`

> can fail in practice, then iteration order must not be salient, by

> definition.

>

>

> On Wed, Mar 16, 2022 at 3:02 AM Matthew Bentley via SG14 <

> sg14_at_[hidden]> wrote:

>

>> On Wed, 16 Mar 2022 at 12:24, Nevin Liber <nevin_at_[hidden]> wrote:

>>

>>>

>>> My opinion is that it should have one, and if it does, the expectation

>>> is

>>> that it is unordered.

>>>

>>> For most containers (other than string), I rarely use its comparison

>>> operators, but when I need them, they are very handy.

>>>

>>> We are supposed to be designing a coherent standard library, and all the

>>> containers in the standard library are equality comparable. This should

>>> not be a snowflake.

>>>

>>> How does the complexity compare with the complexity of

>>> unordered_multiset::operator==?

>>>

>>

>> My apologies to Henry Miller from earlier, time complexity would be

>> approx

>> O(n log n), as he states - I forgot the time complexity of the sort

>> function itself. At any rate, 1 x O(n) (gather pointers), 1 x O(n log n)

>> (sort pointers), 1 x O(n) (compare elements via pointers).

>> There may be better ways of doing what I'm talking about, this is only

>> what I've managed to come up with thus far.

>> Whereas unordered_multiset::operator==is O(n pow 2) in worst case.

>>

>> _______________________________________________

>> SG14 mailing list

>> SG14_at_[hidden]

>> https://lists.isocpp.org/mailman/listinfo.cgi/sg14

>>

>

particular to Tony & Ville for arguing it out, though I think it's

probably not worth the intellectual fisticuffs at this point :)

Arthur, good to know that the algorithm for order-insensitive

comparison is a standardised algorithm that can be used with hive.

My thoughts are as follows:

* An order-sensitive comparison is the only 'cheap' option, which,

while in line with one of the main selling points of the container,

will inevitably confuse the everloving crap out of newcomers, due to

situations like:

hive<int> t = {1, 2, 3, 4, 5}, t2 = {6, 1, 2, 3, 4};

t2.erase(t2.begin());

t2.insert(5);

Whereby teh newb will assume that t1 == t2 now, when in fact the

sequence is most likely to be {5, 1, 2, 3, 4} (but this is not

guaranteed, only guarantee is that the item is inserted, the location

is an implementation decision - maybe the implementation decides to

use up unused spaces in the block before reusing erased spaces).

* An order-insensitive comparison would, I think, be wise Not to

allocate, as for the majority of SG14 fields (particularly embedded)

this is very much not a free operation, and naive use of == comparison

may lead to some significant snafu's.

* The standard approach for order-insensitive containers as Arthur

mentioned is std::is_permutation(a.begin(), a.end(), b.begin(),

b.end());

which is slow but probably the only sensible approach, should we

choose to retain ==. Best case is the sizes don't match, then it's an

O(1) operation, worst case is O(n pow 2), which is mind-blowing to me,

but probably non-problematic for small data sets, and in reality only

sets which differ in their final element fulfill that complexity

guarantee.

Given all of the above, my leaning is toward either (a) retaining an

order-insensitive == operator based on the standard approach, and

educating users as to this being a non-trivial operation which should

only be used sparingly, or (b) removing ==.

I would suggest trying:

(a) the obvious size comparison (O(1)), followed by

(b) Ville's suggestion of trying the order-sensitive comparison first

(O(n)), then only if that fails,

(c) std::is_permutation(a.begin(), a.end(), b.begin(), b.end()); (O(n pow 2))

Statistical benchmarking with/without the order-sensitive comparison

across a large number of data sets should show whether that's worth

including, but that's an implementator's prerogative.

I would also strongly support taking my approach (pointer-indirected

order-insensitive comparison which allows for allocation) and putting

that into a separate std:: algorithm - this can be used with any

container, not just hive. std::is_permutation_allocation_allowed or

something more terse. To answer Pedro Oliveira's question (he replied

to the digest post by mistake), I don't have any statistics on what

the performance is like vs order-insensitive comparison, but if anyone

wants to benchmark it the version of hive with my allocating unordered

== function's attached.

A few additional queries, if case anyone has an opinion on them:

* The sort() function of hive (which, in my implementation, uses a

similar mechanism to the pointer indirection above) is currently

specified as being allowed to allocate. Is this permissible to SG14?

* I think it should probably be specified that copying a container

Cannot change the iteration order. There is no legitimate performance

reason not to specify this (as far as I know), as iterative order is

also memory locality order, hence iterative order is also the fastest

way to copy. This also makes the copying of sort()'ed hives less

problematic. If anyone has any objections to this let me know.

Thanks all once again-

M@

On 17/03/2022, Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]> wrote:

> A few notes:

> - Some people in this thread have used the adjectives "ordered" and

> "unordered" to talk about operator=='s behavior. I believe those terms are

> ambiguous. I used "order-sensitive" and "order-insensitive" to try to deal

> with that ambiguity.

> - Every STL container has a (possibly non-deterministic) *iteration order*,

> i.e., the order in which the elements are traversed from .begin() to

> .end(). Your only job is to decide whether that iteration order is salient

> or not. (For a vector or list, the iteration order is obviously salient.

> For a std::unordered_set, the iteration order is obviously non-salient. For

> a std::set, the iteration order is fixed so conceptually it doesn't matter

> whether it's salient or not, at least for non-pathological element types.)

> - "Order-sensitive equality comparison" is always implementable as

> bool operator==(const C& a, const C& b) { return std::equal(a.begin(),

> a.end(), b.begin(), b.end()); }

> Sequence and associative containers mandate exactly this implementation:

> https://eel.is/c++draft/container.reqmts#lib:operator==,containers

> - "Order-insensitive equality comparison" is always implementable as

> bool operator==(const C& a, const C& b) { return

> std::is_permutation(a.begin(), a.end(), b.begin(), b.end()); }

> Unordered containers mandate a slight performance optimization of this

> implementation: https://eel.is/c++draft/unord.req#general-237

> - If copying a container can change the iteration order, then iteration

> order must not be salient, by definition (because, by definition, a copy

> copies all the salient properties of the original). In other words, if

> `auto b = a; assert(std::equal(a.begin(), a.end(), b.begin(), b.end()));`

> can fail in practice, then iteration order must not be salient, by

> definition.

>

>

> On Wed, Mar 16, 2022 at 3:02 AM Matthew Bentley via SG14 <

> sg14_at_[hidden]> wrote:

>

>> On Wed, 16 Mar 2022 at 12:24, Nevin Liber <nevin_at_[hidden]> wrote:

>>

>>>

>>> My opinion is that it should have one, and if it does, the expectation

>>> is

>>> that it is unordered.

>>>

>>> For most containers (other than string), I rarely use its comparison

>>> operators, but when I need them, they are very handy.

>>>

>>> We are supposed to be designing a coherent standard library, and all the

>>> containers in the standard library are equality comparable. This should

>>> not be a snowflake.

>>>

>>> How does the complexity compare with the complexity of

>>> unordered_multiset::operator==?

>>>

>>

>> My apologies to Henry Miller from earlier, time complexity would be

>> approx

>> O(n log n), as he states - I forgot the time complexity of the sort

>> function itself. At any rate, 1 x O(n) (gather pointers), 1 x O(n log n)

>> (sort pointers), 1 x O(n) (compare elements via pointers).

>> There may be better ways of doing what I'm talking about, this is only

>> what I've managed to come up with thus far.

>> Whereas unordered_multiset::operator==is O(n pow 2) in worst case.

>>

>> _______________________________________________

>> SG14 mailing list

>> SG14_at_[hidden]

>> https://lists.isocpp.org/mailman/listinfo.cgi/sg14

>>

>

Received on 2022-03-17 08:01:18