On Mon, Oct 24, 2022 at 1:22 PM Madhur Chauhan <madhur4127@gmail.com> wrote:
Thanks for the comments Arthur, much appreciated!

I think the core problem is absense of an efficient iterator for bitset.
Normal random access iterator will still access each bit and that's inefficient so we'll need something different that can process words and then trailing bits. 
If that fancy iterator exists then we don't need proposed function and <algorithm> can be used instead.

That's a great point. I had forgotten (but now I vaguely recall) that libc++ has several algorithms, such as std::find, specifically optimized for bit-iterators. They also have a few algorithms specifically optimized for deque::iterator, because deques are "piecewise contiguous" in a way that's known at compile time. (Sadly, P0447 hive::iterator can't benefit from the same optimizations because a hive's degree of piecewise-contiguity is known only at runtime.)

Double-sadly, `std::bitset` doesn't have iterators!  Bit-iterators only exist for `vector<bool>`.  I agree that "bitset is not iterable" might be the root of the problem here. But, again, if we think of a bitset as a mapping from small integers to bools (or a set of small integers), then it kind of makes sense that you can't "iterate" it — you'd just be iterating over "false, false, true, false, true, false..." which isn't relevant to the problem domain. (Contrast to vector<bool>, which is a sequence of bools you can iterate.)
I could see it being hard to convince LEWG that there was a real problem here, unfortunately.

Did you try benchmarking the <algorithm> version of your code, and the hand-rolled version, using `vector<bool>`? How does it compare?
I tried this version of the code:
https://godbolt.org/z/9q88qzG4e

GCC+libstdc++ gives these numbers on Godbolt:

-----------------------------------------------------

Benchmark           Time             CPU   Iterations

-----------------------------------------------------

bs_manual        4184 ns         2251 ns       252405

bv_manual        4061 ns         2349 ns       288193

bv_stl           4164 ns         2210 ns       255113

bs_library        130 ns         62.6 ns     14677544


Clang+libc++ gives these numbers on my local laptop:

-----------------------------------------------------

Benchmark           Time             CPU   Iterations

-----------------------------------------------------

bs_manual        2186 ns         2180 ns       315701

bv_manual        2115 ns         2112 ns       319833

bv_stl           39.7 ns         39.6 ns     17456620



Hmm, I had thought C++20 or C++23 was going to give us std::bit_iterator and std::bit_reference... what ever happened to those?
Not that they would be specifically useful in this scenario, since you can't get at the underlying storage of a std::bitset<N> in order to iterate over it, even if you had a std::bit_iterator adaptor.

–Arthur