C++ Logo

std-proposals

Advanced search

Re: [std-proposals] A Proposal to Add a New Multiset container Method to the Standard Library Technical Report

From: Александр Поваляев <apovalyaev_at_[hidden]>
Date: Wed, 12 Feb 2025 20:52:42 +0300
[[[[[[ My comments below ]]]]]]

ср, 12 февр. 2025 г. в 17:50, Jonathan Wakely <cxx_at_[hidden]>:

>
>
> On Wed, 12 Feb 2025 at 14:36, Александр Поваляев <apovalyaev_at_[hidden]>
> wrote:
>
>> [[[[[My comments below]]]]]
>>
>> ср, 12 февр. 2025 г. в 15:46, Jonathan Wakely <cxx_at_[hidden]>:
>>
>>>
>>>
>>> On Wed, 12 Feb 2025 at 12:12, Александр Поваляев <apovalyaev_at_[hidden]>
>>> wrote:
>>>
>>>> Well, to make things more clear.
>>>> The process of inserting a new element with the same key value might
>>>> lead to a different result (different position of a newly inserted element).
>>>> And the result depends upon whether a hint is being used
>>>> within 'insert' method of std::multiset or not.
>>>> But once the new element has been inserted, the order is preserved
>>>> starting from that moment.
>>>>
>>>> Let's call an 'oldest' element that one which is closest to the
>>>> beginning among all the elements within the same key value.
>>>> To the moment when the 'reduce' method is called.
>>>>
>>>> 'reduce' method is not supposed to use 'equal_range' as we have already
>>>> discovered. Since it would double the number of compare operations required.
>>>>
>>>
>>> I don't agree.
>>>
>>> For example, std::multiset count is 2^20, 'equal_range' will require
>>>> 2*20 compare operations.
>>>>
>>>
>>> For 2^20 equivalent elements, I get 57 comparisons for equal_range. If
>>> you want to remove 1% of those 2^20 equivalent elements, your preferred
>>> approach would require 10k comparisons. But using equal_range would require
>>> 57 comparisons.
>>>
>>> https://godbolt.org/z/E1E5T6daM
>>>
>>> 57 is not double 10k.
>>>
>> Here we are talking about possible implementations.
>> Let me explain to you, how do I count comparison operations needed:
>> * Given a std::multiset of 2^20 elements in total (some of them do have
>> equal keys and some of them don't), 'count' will return 2^20
>> * Let it be a pair of two elements with the same key K (A and B), so both
>> A and B has the same key K and no any other element amongst 2^20 has key K
>> * The order in which these elements are placed inside std::multiset is A
>> then B, so A element is "oldest" and is placed closed to the beginning
>> * So, we're going to delete only one element of these two
>> * We call 'reduce' method of std::multiset container with the following
>> parameters reduce(K, 1)
>> * Here we comes to the place where we can compare two possible
>> implementations you proposed earlier:
>> Implementation 1: 'equal_range' is used and so about 20 * 2
>> comparisons are needed. I get 57 comparisons because a red-black tree is
>> used, not a pure binary.
>> Implementation 2: 'find' method is being used and so, it is required
>> about 20 comparisons (or so, may be 28)
>> * Implementation 2 requires twice as much comparisons as Implementation
>> 1. It is bad, because if a class of elements A and B defines custom
>> comparator, it will work significantly slower.
>> We're talking about STL library implementation (it is a gold standard
>> for industry), so we can't just afford just unnecessary doubling of any
>> operations;
>>
>> * Don't take about O(lnN) or O(N), cause our N is not going to infinity
>> :D, but we know for sure that Implementation 1 works at least twice slower
>> than Implementation 2
>> * I think we will need a third implementation, cause there is a lot of
>> cases which needed to be considered
>>
>
> So make it adaptive:
> https://godbolt.org/z/1jnE4xEEa
>
> I don't consider that necessary. The logarithmic one using equal_range
> wins in general, because optimizing for small sizes isn't usually needed
> (lookup in them is fast enough anyway).
>
[[[[[[Aleksandr]]]]]] I think that anything matters when it is on a
performance path. Selection of the right algorithm makes a significant
difference. Whereas a code optimization can also bring a positive effect.
That is why we're trying to make it more efficient
Implementation1->Implementation2->Implementation3, until everybody agrees
it is optimal for all the use-cases.

So, there was a question: for what particular use-case 'reduce' method can
be used.
I think one of the possible use-cases is 'priority queue which preserves
order of elements'.
Let's consider the case when at some particular moment of time, the
priority queue is consuming only the elements with the equal key.
So in this case, 'starvation' can happen, when at some quite long period of
time a particular element of the priority queue is not handled because all
the newly added elements are inserted before it.


> This still doesn't need to be a new member function of std::multiset
> though.
>
>
>
>

Received on 2025-02-12 17:52:57