C++ Logo


Advanced search

Re: [std-proposals] Implementability of P1478: Byte-wise atomic memcpy on x86_64

From: Thiago Macieira <thiago_at_[hidden]>
Date: Thu, 06 Apr 2023 15:13:12 -0300
On Thursday, 6 April 2023 14:26:49 -03 Thiago Macieira via Std-Proposals
> provided you obey some rules:
> * never cross a cacheline boundary (preferably, align naturally)
> * use operations of 16 bytes or less (ideally: use only 1-uop operations)
> * for all current in-market processors and I believe this applies to AMD
> * for all current P-core processors: 32- and 64-byte accesses are also
> fine

I forgot, because this never happens for userspace, but in case you do kernel
or userspace-driver work:

* WB memory only

UC or WC will behave differently. And this is the key of *why* it works: the
cache coherency. The atomic operations are implemented by "locking" the full
64-byte cacheline (whence also the false sharing and spinlock-on-RMW
problems), so any load or store operation that completes in a single cycle
can't be torn either.

If you do manage to produce a situation where you think you've observed
tearing even with the above rules, contact me. I'd like to see how it happened
and investigate.

Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel DCAI Cloud Engineering

Received on 2023-04-06 18:13:16