C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Gap in the Standard Library? Single-pass fused reductions (similar to Clojure transducers)

From: Jens Maurer <jens.maurer_at_[hidden]>
Date: Sun, 26 Apr 2026 08:07:21 +0200
On 4/26/26 07:06, Jan Schultke via Std-Proposals wrote:
> Idon't think it's necessary to cover every possible combination of algorithms/reductions in one function. At some point, you will just need to use a custom loop, and that's fine.
>
> I was refactoring code at my workplace the other day, which looked something like this (simplified):
>
>
>
> std::vector<int> values = ... ;
> auto isMyFavorite = [](int i) -> bool { ... };
> int* myFavorite = nullptr;
> int* minEven = nullptr;
> for (int& i : values) {
> if (!myFavorite && isMyFavorite(i))
> myFavorite = &i;
> if ((i%2==0) && (!minEven || i < *minEven)
> minEven = &i;
> }
>
>
> It's not a great example. In your transformation below, you mention find_if, but that's not exactly what you have here. You don't quit early when you have found an isMyFavorite element, so you're finding the last element matching the predicate in the range. That's not what find_if does, and it may be a lot slower.
>
> You can already break this up into:
>
> * std::ranges::find_if(values, isMyFavorite)
> * std::ranges::min_element(values, values | std::views::filter(even))

It would be good to have an actual catalog of standard library functions
that would be in scope for the new facility. I guess that interface
surface would probably be quite large. The effort of standardizing this
is likely going to be a multi-year effort, are you ready for that?

> You're saying that you would like to do it in a single pass, but have you actually measured whether that would improve performance meaningfully in any scenario?

I agree with the idea of benchmarking this single-pass idea (via a hand-written loop)
against the traditional multi-pass approach.

My guess is that, for carefully chosen algorithms, it does make a difference for
large ranges (> 1G elements) where your throughput is limited by main memory bandwidth.
For a lot of smaller situations, my guess is that the sequential memory access
pattern will trigger the prefetcher of the CPU, making cache locality concerns
almost moot.

Jens

Received on 2026-04-26 06:07:25