Date: Sat, 10 Jan 2026 13:31:35 +0000
On Sat, Jan 10, 2026 at 1:50 AM Frederick Virchanza Gotham wrote:
>
> I think I pretty much have "std::lockfree_2ptrs" finished:
>
> https://godbolt.org/z/ExTMf8sn8
>
> I have given it two member functions, "set_and_increment" and
> "set_and_decrement" especially for lockfree containers which use
> [pointer + counter].
Now I need to explore whether this would actually be usable in the
likes of Boost's lockfree containers. Turns out that "spsc_queue"
doesn't need two pointers. But "queue" does, as it pushes as follows:
bool push(T const &t)
{
node * n = pool.template construct<true, false>(t, pool.null_handle());
handle_type node_handle = pool.get_handle(n);
if ( n == NULL ) return false;
for (;;)
{
tagged_node_handle tail = tail_.load(memory_order_acquire);
node * tail_node = pool.get_pointer(tail);
tagged_node_handle next =
tail_node->next.load(memory_order_acquire);
node * next_ptr = pool.get_pointer(next);
tagged_node_handle tail2 = tail_.load(memory_order_acquire);
if (BOOST_LIKELY(tail == tail2))
{
if (next_ptr == 0)
{
tagged_node_handle new_tail_next(node_handle,
next.get_next_tag());
if ( tail_node->next.compare_exchange_weak(next,
new_tail_next) )
{
tagged_node_handle new_tail(node_handle,
tail.get_next_tag());
tail_.compare_exchange_strong(tail, new_tail);
return true;
}
}
else
{
tagged_node_handle
new_tail(pool.get_handle(next_ptr), tail.get_next_tag());
tail_.compare_exchange_strong(tail, new_tail);
}
}
}
}
In the above code, you can see that "queue::tail_" and "node::next"
seem to be atomic variables (I think 128-Bit on x86_64). I can check
what they are on GodBolt:
https://godbolt.org/z/Ps7bos9cP
Hmm.... the [pointer + counter] is actually 64-Bit instead of 128-Bit,
and it's because it's using pointer compression. So let's
improvisingly turn off the pointer compression:
https://godbolt.org/z/rvde9foMT
Now we see that it's 128-Bit, and that it's lockfree on LLVM clang.
But it uses a mutex on GNU g++.
Now let's take a look in the Boost code where they set the pointer
compression setting:
#if BOOST_ARCH_X86_64 || ( (BOOST_ARCH_ARM >=
BOOST_VERSION_NUMBER(8,0,0)) && !BOOST_PLAT_ANDROID )
# define BOOST_LOCKFREE_PTR_COMPRESSION 1
#endif
So if you're building for x86_64 then Boost gives you no way of
disabling pointer compression, not even a preprocessor flag.
Now let's dissect the 'why' behind all of this. A few points to take
into account:
1) Very old x86_64 CPU's don't have the cx16 instruction
2) The C++ committee decided to allow std::atomic to use locks
My conclusion for why we are in this mess is as follows:
1) We 're accommodating old computers hidden up in people's attics
covered in cobwebs
2) The C++ committee made a very bad decision to allow std::atomic
to use locks
I propose the following solutions:
1) Build separate binaries for x86-64 and x86-64-v2 (the former
locks, the latter cx16's)
2) Create a new standard library class std::lockfree which is a
perfect clone of std::atomic except that it can't use locks
Now if there's anyone reading this email right now who works in
finance or another performance-critical domain, and if you want to get
a bonus this year . . . Grep through your codebase and look for
"#include <boost/lockfree/queue.hpp>", and when you find it, replace
it with:
#undef BOOST_LOCKFREE_PREFIX_HPP_INCLUDED
#include <boost/lockfree/detail/prefix.hpp>
#undef BOOST_LOCKFREE_PTR_COMPRESSION
#include <boost/lockfree/queue.hpp>
Rebuild your software, run it through the timing benchmark suite, and
then run to go tell your boss that you made it quicker. I wouldn't
mind a fully remote job in finance so you can mention my name.
>
> I think I pretty much have "std::lockfree_2ptrs" finished:
>
> https://godbolt.org/z/ExTMf8sn8
>
> I have given it two member functions, "set_and_increment" and
> "set_and_decrement" especially for lockfree containers which use
> [pointer + counter].
Now I need to explore whether this would actually be usable in the
likes of Boost's lockfree containers. Turns out that "spsc_queue"
doesn't need two pointers. But "queue" does, as it pushes as follows:
bool push(T const &t)
{
node * n = pool.template construct<true, false>(t, pool.null_handle());
handle_type node_handle = pool.get_handle(n);
if ( n == NULL ) return false;
for (;;)
{
tagged_node_handle tail = tail_.load(memory_order_acquire);
node * tail_node = pool.get_pointer(tail);
tagged_node_handle next =
tail_node->next.load(memory_order_acquire);
node * next_ptr = pool.get_pointer(next);
tagged_node_handle tail2 = tail_.load(memory_order_acquire);
if (BOOST_LIKELY(tail == tail2))
{
if (next_ptr == 0)
{
tagged_node_handle new_tail_next(node_handle,
next.get_next_tag());
if ( tail_node->next.compare_exchange_weak(next,
new_tail_next) )
{
tagged_node_handle new_tail(node_handle,
tail.get_next_tag());
tail_.compare_exchange_strong(tail, new_tail);
return true;
}
}
else
{
tagged_node_handle
new_tail(pool.get_handle(next_ptr), tail.get_next_tag());
tail_.compare_exchange_strong(tail, new_tail);
}
}
}
}
In the above code, you can see that "queue::tail_" and "node::next"
seem to be atomic variables (I think 128-Bit on x86_64). I can check
what they are on GodBolt:
https://godbolt.org/z/Ps7bos9cP
Hmm.... the [pointer + counter] is actually 64-Bit instead of 128-Bit,
and it's because it's using pointer compression. So let's
improvisingly turn off the pointer compression:
https://godbolt.org/z/rvde9foMT
Now we see that it's 128-Bit, and that it's lockfree on LLVM clang.
But it uses a mutex on GNU g++.
Now let's take a look in the Boost code where they set the pointer
compression setting:
#if BOOST_ARCH_X86_64 || ( (BOOST_ARCH_ARM >=
BOOST_VERSION_NUMBER(8,0,0)) && !BOOST_PLAT_ANDROID )
# define BOOST_LOCKFREE_PTR_COMPRESSION 1
#endif
So if you're building for x86_64 then Boost gives you no way of
disabling pointer compression, not even a preprocessor flag.
Now let's dissect the 'why' behind all of this. A few points to take
into account:
1) Very old x86_64 CPU's don't have the cx16 instruction
2) The C++ committee decided to allow std::atomic to use locks
My conclusion for why we are in this mess is as follows:
1) We 're accommodating old computers hidden up in people's attics
covered in cobwebs
2) The C++ committee made a very bad decision to allow std::atomic
to use locks
I propose the following solutions:
1) Build separate binaries for x86-64 and x86-64-v2 (the former
locks, the latter cx16's)
2) Create a new standard library class std::lockfree which is a
perfect clone of std::atomic except that it can't use locks
Now if there's anyone reading this email right now who works in
finance or another performance-critical domain, and if you want to get
a bonus this year . . . Grep through your codebase and look for
"#include <boost/lockfree/queue.hpp>", and when you find it, replace
it with:
#undef BOOST_LOCKFREE_PREFIX_HPP_INCLUDED
#include <boost/lockfree/detail/prefix.hpp>
#undef BOOST_LOCKFREE_PTR_COMPRESSION
#include <boost/lockfree/queue.hpp>
Rebuild your software, run it through the timing benchmark suite, and
then run to go tell your boss that you made it quicker. I wouldn't
mind a fully remote job in finance so you can mention my name.
Received on 2026-01-10 13:30:43
