Thanks for the comments Arthur, much appreciated!
I think the core problem is absense of an efficient iterator for bitset.
Normal random access iterator will still access each bit and that's inefficient so we'll need something different that can process words and then trailing bits.
If that fancy iterator exists then we don't need proposed function and <algorithm> can be used instead.
That's a great point. I had forgotten (but now I vaguely recall) that libc++ has several algorithms, such as std::find, specifically optimized for bit-iterators. They also have a few algorithms specifically optimized for deque::iterator, because deques are "piecewise contiguous" in a way that's known at compile time. (Sadly, P0447 hive::iterator can't benefit from the same optimizations because a hive's degree of piecewise-contiguity is known only at runtime.)
Double-sadly, `std::bitset` doesn't have iterators! Bit-iterators only exist for `vector<bool>`. I agree that "bitset is not iterable" might be the root of the problem here. But, again, if we think of a bitset as a mapping from small integers to bools (or a set of small integers), then it kind of makes sense that you can't "iterate" it — you'd just be iterating over "false, false, true, false, true, false..." which isn't relevant to the problem domain. (Contrast to vector<bool>, which is a sequence of bools you can iterate.)
I could see it being hard to convince LEWG that there was a real problem here, unfortunately.
Did you try benchmarking the <algorithm> version of your code, and the hand-rolled version, using `vector<bool>`? How does it compare?
I tried this version of the code:
GCC+libstdc++ gives these numbers on Godbolt:
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
bs_manual 4184 ns 2251 ns 252405
bv_manual 4061 ns 2349 ns 288193
bv_stl 4164 ns 2210 ns 255113
bs_library 130 ns 62.6 ns 14677544
Clang+libc++ gives these numbers on my local laptop:
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
bs_manual 2186 ns 2180 ns 315701
bv_manual 2115 ns 2112 ns 319833
bv_stl 39.7 ns 39.6 ns 17456620
Hmm, I had thought C++20 or C++23 was going to give us std::bit_iterator and std::bit_reference... what ever happened to those?
Not that they would be specifically useful in this scenario, since you can't get at the underlying storage of a std::bitset<N> in order to iterate over it, even if you had a std::bit_iterator adaptor.
–Arthur