Date: Wed, 21 May 2025 16:34:25 +0100
# Faster access to std::bitset
Hello everyone, I want to open up discussion on some improvements
I believe the committee should consider for the `std::bitset`
structure, in particular related to how the underlaying data
is accessed.
`std::bitset` is a strange collection present on the standard,
it's intended to abstract "a sequence consisting of a fixed
number of bits", and it does that by allowing accessor methods
to individual bits (`operator[]`) while at the same time allowing
fast construction (destruction) from (to) *short* sequences of
bits, `unsigned long` and `unsigned long long`.
However, this access to the underlying properties of this structure
break quite early on, since as soon as we want our `bitset` to keep
more than 64 bits (in most platforms) we can't construct it from
such a contiguous array of elements, nor can we extract the contiguous
elements.
I hope to start a conversation, and maybe draft up a paper, on adding
more powerful accesor methods for `std::bitset`.
## `std::bitset` is a very slow sequence
The main reason behind this proposal is that accessing the sequence
of bits in the current implementation of `std::bitset` is slow, very slow.
When googling on how to serialize a `std::bitset` the main result points to
https://stackoverflow.com/a/7463972, that suggests serializing by iterating
over
every bit, and storing that into an `std::vector`.
The second answer uses the only way to access the whole collection of the
`bitset` withouth iterating: `to_string`.
To understand better these differences, I tried a few approaches
on serialization, transforming a `std::bitset<_>` into an
`std::array<unsigned long long, _>`.
https://github.com/teofr/fast_bitset
The variation in performance is, as expected, huge.
If only comparing the accepted answer from the
StackOverflow link above, and a simple function that
copies the internal `std::array`, we get the following
results, varying the size of the bitset.
| Size | Iterating | Copying | Relative |
| ------- | --------- | ------- | -------- |
| 1 << 6 | 0.3 ns | 0.3 ns | 1 |
| 1 << 8 | 468 ns | 0.3 ns | 1560 |
| 1 << 10 | 1986 ns | 1.23 ns | 1614 |
| 1 << 12 | 8009 ns | 6.27 ns | 1277 |
| 1 << 14 | 32172 ns | 26.2 ns | 1227 |
As we can see, disallowing a direct access to the underlying
memory means a huge loss of performance.
Using `to_string` to serialize is, of course, worse than
both options.
> The performance numbers shown above are from my Mac M2,
> similar results gotten on a more powerful server
> (Intel Xeon Platinum 8168)
## Potential ideas
There's probably more ways to solve this, but these could be some,
in no particular order:
### Iterate over the carrier type
We could provide an internal iterator, where the elements
being iterated over are of type `ulong` or `ullong`.
```cpp
std::bitset<128> b;
for(unsigned long long i : b.to_ullong_iterator()) {
...
}
```
### Direct access to the internal `std::array`
Another option, would be to directly return an `std::array<unsigned long
long, _>` with
the contents, either by copy or by reference.
If returning by reference `std::bitset` should be restricted
to store the bits in contigouos memory.
## Constructing
Not mentioned in this email, but there's a dual face for the same
problem, constructing `bitset`s from data.
-----
Thanks for reading,
please let me know your thoughts.
Teo
Hello everyone, I want to open up discussion on some improvements
I believe the committee should consider for the `std::bitset`
structure, in particular related to how the underlaying data
is accessed.
`std::bitset` is a strange collection present on the standard,
it's intended to abstract "a sequence consisting of a fixed
number of bits", and it does that by allowing accessor methods
to individual bits (`operator[]`) while at the same time allowing
fast construction (destruction) from (to) *short* sequences of
bits, `unsigned long` and `unsigned long long`.
However, this access to the underlying properties of this structure
break quite early on, since as soon as we want our `bitset` to keep
more than 64 bits (in most platforms) we can't construct it from
such a contiguous array of elements, nor can we extract the contiguous
elements.
I hope to start a conversation, and maybe draft up a paper, on adding
more powerful accesor methods for `std::bitset`.
## `std::bitset` is a very slow sequence
The main reason behind this proposal is that accessing the sequence
of bits in the current implementation of `std::bitset` is slow, very slow.
When googling on how to serialize a `std::bitset` the main result points to
https://stackoverflow.com/a/7463972, that suggests serializing by iterating
over
every bit, and storing that into an `std::vector`.
The second answer uses the only way to access the whole collection of the
`bitset` withouth iterating: `to_string`.
To understand better these differences, I tried a few approaches
on serialization, transforming a `std::bitset<_>` into an
`std::array<unsigned long long, _>`.
https://github.com/teofr/fast_bitset
The variation in performance is, as expected, huge.
If only comparing the accepted answer from the
StackOverflow link above, and a simple function that
copies the internal `std::array`, we get the following
results, varying the size of the bitset.
| Size | Iterating | Copying | Relative |
| ------- | --------- | ------- | -------- |
| 1 << 6 | 0.3 ns | 0.3 ns | 1 |
| 1 << 8 | 468 ns | 0.3 ns | 1560 |
| 1 << 10 | 1986 ns | 1.23 ns | 1614 |
| 1 << 12 | 8009 ns | 6.27 ns | 1277 |
| 1 << 14 | 32172 ns | 26.2 ns | 1227 |
As we can see, disallowing a direct access to the underlying
memory means a huge loss of performance.
Using `to_string` to serialize is, of course, worse than
both options.
> The performance numbers shown above are from my Mac M2,
> similar results gotten on a more powerful server
> (Intel Xeon Platinum 8168)
## Potential ideas
There's probably more ways to solve this, but these could be some,
in no particular order:
### Iterate over the carrier type
We could provide an internal iterator, where the elements
being iterated over are of type `ulong` or `ullong`.
```cpp
std::bitset<128> b;
for(unsigned long long i : b.to_ullong_iterator()) {
...
}
```
### Direct access to the internal `std::array`
Another option, would be to directly return an `std::array<unsigned long
long, _>` with
the contents, either by copy or by reference.
If returning by reference `std::bitset` should be restricted
to store the bits in contigouos memory.
## Constructing
Not mentioned in this email, but there's a dual face for the same
problem, constructing `bitset`s from data.
-----
Thanks for reading,
please let me know your thoughts.
Teo
Received on 2025-05-21 15:34:45
