Date: Tue, 31 Aug 2021 17:13:22 -0700
On Tuesday, 31 August 2021 12:17:57 PDT Marko Mäkelä via Std-Proposals wrote:
> I updated https://github.com/dr-m/atomic_sync once more. With GCC
> 11.2.0, the atomic_spin_mutex::wait_and_lock() now loops around a load,
> instead of looping around lock cmpxchg. On my system, the difference
> between pinning the test program to a single NUMA node, or running it
> normally (using both CPU sockets in the system) became smaller.
Now try doing that with 43 threads pinned each to a different logical
processor, across different NUMA nodes (different CPU sockets). This was the
issue I spent weeks debugging last year.
> I updated https://github.com/dr-m/atomic_sync once more. With GCC
> 11.2.0, the atomic_spin_mutex::wait_and_lock() now loops around a load,
> instead of looping around lock cmpxchg. On my system, the difference
> between pinning the test program to a single NUMA node, or running it
> normally (using both CPU sockets in the system) became smaller.
Now try doing that with 43 threads pinned each to a different logical
processor, across different NUMA nodes (different CPU sockets). This was the
issue I spent weeks debugging last year.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Software Architect - Intel DPG Cloud Engineering
Received on 2021-08-31 19:13:30