C++ Logo

std-proposals

Advanced search

[std-proposals] Faster access to std::bitset

From: Teodoro Freund <tfreund95_at_[hidden]>
Date: Wed, 21 May 2025 16:34:25 +0100
# Faster access to std::bitset

Hello everyone, I want to open up discussion on some improvements
I believe the committee should consider for the `std::bitset`
structure, in particular related to how the underlaying data
is accessed.

`std::bitset` is a strange collection present on the standard,
it's intended to abstract "a sequence consisting of a fixed
number of bits", and it does that by allowing accessor methods
to individual bits (`operator[]`) while at the same time allowing
fast construction (destruction) from (to) *short* sequences of
bits, `unsigned long` and `unsigned long long`.

However, this access to the underlying properties of this structure
break quite early on, since as soon as we want our `bitset` to keep
more than 64 bits (in most platforms) we can't construct it from
such a contiguous array of elements, nor can we extract the contiguous
elements.

I hope to start a conversation, and maybe draft up a paper, on adding
more powerful accesor methods for `std::bitset`.

## `std::bitset` is a very slow sequence

The main reason behind this proposal is that accessing the sequence
of bits in the current implementation of `std::bitset` is slow, very slow.
When googling on how to serialize a `std::bitset` the main result points to
https://stackoverflow.com/a/7463972, that suggests serializing by iterating
over
every bit, and storing that into an `std::vector`.

The second answer uses the only way to access the whole collection of the
`bitset` withouth iterating: `to_string`.

To understand better these differences, I tried a few approaches
on serialization, transforming a `std::bitset<_>` into an
`std::array<unsigned long long, _>`.
https://github.com/teofr/fast_bitset

The variation in performance is, as expected, huge.
If only comparing the accepted answer from the
StackOverflow link above, and a simple function that
copies the internal `std::array`, we get the following
results, varying the size of the bitset.

| Size | Iterating | Copying | Relative |
| ------- | --------- | ------- | -------- |
| 1 << 6 | 0.3 ns | 0.3 ns | 1 |
| 1 << 8 | 468 ns | 0.3 ns | 1560 |
| 1 << 10 | 1986 ns | 1.23 ns | 1614 |
| 1 << 12 | 8009 ns | 6.27 ns | 1277 |
| 1 << 14 | 32172 ns | 26.2 ns | 1227 |

As we can see, disallowing a direct access to the underlying
memory means a huge loss of performance.

Using `to_string` to serialize is, of course, worse than
both options.

> The performance numbers shown above are from my Mac M2,
> similar results gotten on a more powerful server
> (Intel Xeon Platinum 8168)

## Potential ideas

There's probably more ways to solve this, but these could be some,
in no particular order:

### Iterate over the carrier type

We could provide an internal iterator, where the elements
being iterated over are of type `ulong` or `ullong`.

```cpp
std::bitset<128> b;
for(unsigned long long i : b.to_ullong_iterator()) {
    ...
}
```

### Direct access to the internal `std::array`

Another option, would be to directly return an `std::array<unsigned long
long, _>` with
the contents, either by copy or by reference.
If returning by reference `std::bitset` should be restricted
to store the bits in contigouos memory.

## Constructing

Not mentioned in this email, but there's a dual face for the same
problem, constructing `bitset`s from data.

-----

Thanks for reading,
please let me know your thoughts.

Teo

Received on 2025-05-21 15:34:45