C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Floating an idea: unstable_remove_if and friends

From: Sebastian Wittmeier <wittmeier_at_[hidden]>
Date: Fri, 7 Feb 2025 17:52:20 +0100
What is the bottleneck? How much is the theoretical RAM bandwidth used up by the reads and writes? How does L1 caching play into it? Including the cache line size? How well does branch prediction work? Would other data types (e.g. 4 bytes or 8 bytes wide) work better?   -----Ursprüngliche Nachricht----- Von:Arthur O‘Dwyer via Std-Proposals <std-proposals_at_[hidden]> Gesendet:Fr 07.02.2025 17:48 Betreff:Re: [std-proposals] Floating an idea: unstable_remove_if and friends An:std-proposals_at_[hidden]; Ted Lyngmo <ted_at_[hidden]>; CC:Arthur O‘Dwyer <arthur.j.odwyer_at_[hidden]>; Giuseppe D‘Angelo <giuseppe.dangelo_at_[hidden]>; You can find a few more benchmarks for `unstable_remove_if` by searching GitHub (oddly, not Google — I guess Google's AI is feuding with Microsoft's AI now) for "unstable_remove_if benchmark."  Eric Niebler's range-v3 library has `unstable_remove_if` (#1010), including some potentially important discussion about the iterator+sentinel overload. At first glance I think that discussion applies only to the ranges::action::unstable_remove_if "action," which wouldn't be included in the Standard Library version, but I could be wrong.  My own benchmark for SG14's `unstable_remove_if` (here) compares the algorithm on a vector(30'000) of array<int,16>. Its reported timings on my MacBook just now are 620µs for unstable_remove_if, 650µs for stable remove_if.  It occurred to me yesterday that (at least when the predicate is usually-false) the stable `remove_if` spends most of its time shifting down large chunks of elements by small fixed amounts. libc++, libstdc++, and MS all implement that as repeated single assignments: test, assign, increment, test, assign, increment... But in the commonest case (contiguous iterator, trivially move-assignable value_type) it might make sense to implement it as test, increment, test, increment, test, increment, std::copy; test, increment, test, increment, test, increment, test, increment, std::copy; and so on, because the large std::copy chunks can be further optimized into memmove. I wonder if anyone's ever benchmarked that.  –Arthur   On Thu, Feb 6, 2025 at 7:34 PM Ivan Matek via Std-Proposals <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> > wrote: If you could share some code that would be nice.  It is possible that compiler just decided to not unroll one loop and unroll another or something silly like that.  Although last time I checked this(10+y ago) I also noticed minimal improvements except for types that do not have efficient move constructor. On Thu, Feb 6, 2025 at 10:10 PM Ted Lyngmo via Std-Proposals <std-proposals_at_[hidden] <mailto:std-proposals_at_[hidden]> > wrote: Hi! I created a few simple test cases to be able to show what would be gained by using the set of unstable_* functions since I didn’t have my old tests saved. I’m not sure what’s changed since I wrote the functions, but their performance is really bad. Only with very contrived data sets will they outperform the “stable” versions. I assume cache locality explains most of the difference. I must have been testing them with some bias back then or used an ancient computer :) Anyway, I no longer think they are such a good idea.    -- Std-Proposals mailing list Std-Proposals_at_[hidden] <mailto:Std-Proposals_at_[hidden]> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals -- Std-Proposals mailing list Std-Proposals_at_[hidden] https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2025-02-07 16:56:16