C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Floating an idea: unstable_remove_if and friends

From: Peter Bindels <dascandy_at_[hidden]>
Date: Mon, 10 Feb 2025 11:34:35 +0100
On Mon, Feb 10, 2025 at 1:57 AM Ivan Matek via Std-Proposals <
std-proposals_at_[hidden]> wrote:

>
>
> On Fri, Feb 7, 2025 at 5:48 PM Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
> wrote:
>
>> My own benchmark for SG14's `unstable_remove_if` (here
>> <https://github.com/Quuxplusone/SG14/blob/b509fc470a30e1d19820487f899f987c12653a68/benchmarks/unstable_remove_bench.cpp#L20-L38>)
>> compares the algorithm on a vector(30'000) of array<int,16>. Its reported
>> timings on my MacBook just now are 620µs for unstable_remove_if, 650µs for
>> stable remove_if.
>>
>>>
>>> On my intel cpu with clang and libstdc++ numbers are a bit better for
> unstable, i.e. speedup is larger, but similar story
> UnstableRemoveIf/iterations:20000 243797 ns 243755 ns
> 20000
> StdPartition/iterations:20000 290692 ns 290617 ns
> 20000
> StdRemoveIf/iterations:20000 314727 ns 314668 ns
> 20000
>
> Numbers itself are not comparable since I prefer fixed number of
> iterations, but in general unstable is fastest.
>
> But what I wanted is to show that if one is creative(i.e. I looked for
> largest realistic setup where I expected unstable to shine) with test setup
> larger speedups can be observed.
> I have picked strings of maximum SSO size to make moves expensive and then
> things look quite good for unstable.
> unstable_remove_if/iterations:100000 92.3 ns 92.3 ns
> 100000
> std_partition/iterations:100000 106 ns 106 ns
> 100000
> std_remove_if/iterations:100000 142 ns 142 ns
> 100000
>
> When we make strings allocate speedups are as expected not that impressive
> because advantage over partition is smaller. Without profiling this seems
> expected since advantage over partition is that we do not need to keep
> "removed" item fine as partition does and
> that makes number of operations we need to do when operating in SSO buffer
> much smaller. When we are just rewiring heap pointers difference between
> move and swap is probably not that large.
> unstable_remove_if/iterations:100000 95.5 ns 95.5 ns
> 100000
> std_partition/iterations:100000 100 ns 100 ns
> 100000
> std_remove_if/iterations:100000 135 ns 135 ns
> 100000
>
> What is interesting that when we add some fraction of heap allocations in
> test data remove_if seems to suffer a lot. I presumed this is because of
> branch prediction, and basic checks with perf show indeed branch
> misprediction differences, but unclear if that is cause of performance
> differences.
> unstable_remove_if/iterations:100000 96.6 ns 96.6 ns
> 100000
> std_partition/iterations:100000 110 ns 110 ns
> 100000
> std_remove_if/iterations:100000 172 ns 172 ns
> 100000
>

What about the indirect benefits? From how I understand the proposal the
unstable_* functions do a whole lot less cache line clearing and
memcpy'ing; this should have a general benefit on software beyond what you
can measure in a microbenchmark.

Received on 2025-02-10 10:34:48