Date: Tue, 5 May 2026 08:14:52 +0200
On 4/27/26 13:23, Máté Ték wrote:
> Compiled with Visual Studio 2026, C++23, /O2 optimizations.
Please also test with gcc and/or clang. Also, please present the results
in a consistent order; switching between separate/ranges/fused is not helpful.
And switching units (ns -> us) within a single block is not helpful.
And please spare us six digits of hallucinated precision.
And, if you want to do that properly, run each test case several times
and also print the standard deviation of the runs. This gives us some
idea of the reproducibility.
Please also test with "regular" algorithms, e.g. std::find_if and
std::min_element.
More comments inline below.
> Run on HP ZBook i7-13850HX
>
> Separate vec_100 339.355 ns
> Ranges vec_100 424.226 ns
> Fused vec_100 311.085 ns
>
> Separate vec_250 820.78 ns
> Fused vec_250 675.184 ns
> Ranges vec_250 1.2445 us
>
> Fused vec_500 1.36191 us
> Ranges vec_500 2.45033 us
> Separate vec_500 1.73693 us
>
> Fused vec_1k 2.82963 us
> Ranges vec_1k 4.74799 us
> Separate vec_1k 4.02432 us
>
> Fused vec_2k 5.96424 us
> Ranges vec_2k 11.0527 us
> Separate vec_2k 13.5118 us
>
> Fused vec_5k 26.3804 us
> Ranges vec_5k 38.922 us
> Separate vec_5k 45.5524 us
>
> Fused vec_10k 74.6032 us
> Separate vec_10k 95.6572 us
> Ranges vec_10k 100.218 us
That's a bit of a jump here. I suggest testing with 100k elements, too.
> Fused vec_1M 10.8347 ms
> Separate vec_1M 10.9401 ms
> Ranges vec_1M 11.72 ms>
> Separate vec_10M 111.501 ms
> Fused vec_10M 110.26 ms
> Ranges vec_10M 124.79 ms
>
> Separate vec_100M 1.11905 s
> Ranges vec_100M 1.21792 s
> Fused vec_100M 1.06251 s
This is remarkably close. Please check the assembly whether
the optimizer merged the two loops in "separate".
> The fused approach seems to be fastest most of the time, but never worse than the separate approach.
> Also, the std::ranges approach is almost always the slowest for some reason? Maybe due to bounds checking? Maybe I am misusing them? Feel free to fix it.
Have a look at the generated assembly and find out why.
Jens
> Compiled with Visual Studio 2026, C++23, /O2 optimizations.
Please also test with gcc and/or clang. Also, please present the results
in a consistent order; switching between separate/ranges/fused is not helpful.
And switching units (ns -> us) within a single block is not helpful.
And please spare us six digits of hallucinated precision.
And, if you want to do that properly, run each test case several times
and also print the standard deviation of the runs. This gives us some
idea of the reproducibility.
Please also test with "regular" algorithms, e.g. std::find_if and
std::min_element.
More comments inline below.
> Run on HP ZBook i7-13850HX
>
> Separate vec_100 339.355 ns
> Ranges vec_100 424.226 ns
> Fused vec_100 311.085 ns
>
> Separate vec_250 820.78 ns
> Fused vec_250 675.184 ns
> Ranges vec_250 1.2445 us
>
> Fused vec_500 1.36191 us
> Ranges vec_500 2.45033 us
> Separate vec_500 1.73693 us
>
> Fused vec_1k 2.82963 us
> Ranges vec_1k 4.74799 us
> Separate vec_1k 4.02432 us
>
> Fused vec_2k 5.96424 us
> Ranges vec_2k 11.0527 us
> Separate vec_2k 13.5118 us
>
> Fused vec_5k 26.3804 us
> Ranges vec_5k 38.922 us
> Separate vec_5k 45.5524 us
>
> Fused vec_10k 74.6032 us
> Separate vec_10k 95.6572 us
> Ranges vec_10k 100.218 us
That's a bit of a jump here. I suggest testing with 100k elements, too.
> Fused vec_1M 10.8347 ms
> Separate vec_1M 10.9401 ms
> Ranges vec_1M 11.72 ms>
> Separate vec_10M 111.501 ms
> Fused vec_10M 110.26 ms
> Ranges vec_10M 124.79 ms
>
> Separate vec_100M 1.11905 s
> Ranges vec_100M 1.21792 s
> Fused vec_100M 1.06251 s
This is remarkably close. Please check the assembly whether
the optimizer merged the two loops in "separate".
> The fused approach seems to be fastest most of the time, but never worse than the separate approach.
> Also, the std::ranges approach is almost always the slowest for some reason? Maybe due to bounds checking? Maybe I am misusing them? Feel free to fix it.
Have a look at the generated assembly and find out why.
Jens
Received on 2026-05-05 06:14:55
