std-discussion

From: Ryan Nicholl <exaeta_at_[hidden]> · Date: Mon, 25 Jul 2022 04:37:46 +0000

--
Ryan Nicholl
(678)-358-7765
Sent from Proton Mail mobile
-------- Original Message --------
On Jul 24, 2022, 20:59, Nate Eldredge wrote:
> On Tue, 19 Jul 2022, Ryan Nicholl via Std-Discussion wrote: > Interjection here, but isn't the point of atomic wait/notify to > implement things like condition variables? In that case, a read-only > operation in relaxed ordering would not be required to obtain any value > written before notify_one unless notify_one has an implied release. I'm > unsure that the implied "happens before" ordering for notify exists > especially given spurious wakeups could cause it to return before write > and a write could happen in relaxed mode which implies no ordering. I > doubt this is an issue in practice since the kernel will probably flush > caches on context switch triggering a seq_cst-like behavior, but it's > good to think about. I would say notify does NOT order anything and you > "must" use acquire/release on the atomic to get ordering. Anything else > could require a lot of overhead in the implementation if the kernel > doesn't flush caches. My assumption is that wait/notify are intended as > low level wrappers that map directly to SYS_futex, waitv, > WaitForMultipleObjectsEx and similar system calls, so any kind of > sequencing would make the implementation tricky if missing in any OS. Under this interpretation, I believe it would actually be impossible to ever use wait/notify without risking deadlock, regardless of the memory order you select. I used relaxed ordering in my example for simplicity, but the problem persists with stronger orders. If we do: void thr1() { b.store(true, std::memory_order_release); b.notify_one(); } int main() { std::thread t(thr1); b.wait(false, std::memory_order_acquire); t.join(); return 0; } the only way to get synchronization between the threads is by having the load in wait() take its value from the store() in thr1 (https://eel.is/c++draft/atomics.order#2). But that is precisely what we are unable to prove, so we are still in the same pickle. Even if all accesses are seq_cst, I still can't prove that it's deadlock-free. The loads and stores participate in the total order S, but it is not made clear whether notify and unblock do so. So S could have both the pre-block and post-unblock loads preceding the store, in which case both loads return false, and the main thread blocks a second time. I have not been able to show that this would contradict any of the ordering rules. It could even fail under my "perverse" implementation upthread; nothing prevents the relaxed __counter.fetch_add() from being hoisted above the release b.store(), nor the relaxed __counter.load() from being sunk below an acquire b.load(). The barriers go the wrong way. And adding seq_cst doesn't help either: it gives us more guarantees about how seq_cst operations are ordered with respect to each other, but buys us nothing about how they are ordered with respect to weaker operations, which is the issue here. Adding synchronizes-with wouldn't normally require the kernel or anyone else to flush caches. As I understand it, modern multiprocessor systems have coherent cache anyway, and the issues with memory ordering come from features like per-core store buffers, which delay the commit to L1 cache. A release barrier just has to wait for the store buffer to drain, which is no worse than the corresponding cache misses on an in-order machine, and an acquire barrier just has to inhibit out-of-order speculative loads from passing it. So they are much cheaper than cache flushes or IPI or anything like that. -- Nate Eldredge nate_at_[hidden]