ISOCPP Std Discussion List: Re: Fwd: Synchronization of atomic notify and wait

From: Marcin Jaczewski <marcinjaczewski86_at_[hidden]>
Date: Thu, 21 Jul 2022 00:44:56 +0200

śr., 20 lip 2022 o 16:59 <language.lawyer_at_[hidden]> napisał(a):
>
> On 20/07/2022 15:08, Marcin Jaczewski via Std-Discussion wrote:
> > śr., 20 lip 2022 o 10:46 <language.lawyer_at_[hidden]> napisał(a):
> >>
> >> On 20/07/2022 13:17, Marcin Jaczewski wrote:
> >>> How could it be possible that after unblocking it loads old value?
> >>
> >> Do you want me to describe a hypothetical implementation? Ok.
> >> Weak cache model. The main thread is on Core 0, the non-main one on Core 1. Each having own L1 cache. The main thread loads false and blocks by putting Core 0 into sleep state.
> >> The non-main thread writes true, which only lives in its L1 cache, calls `notify`, which sends IPI to Core 0, the main thread wakes up, reads false from its L1 cache and blocks again.
> >>
> >> Ofc, the non-main thread could "publish" its L1 cache or the main thread could ask for update, but why do they have to do it, from the C++ abstract machine POV? What they violate if they don't do it?
> >>
> >
> > Hold on, could I even do any atomic operation on this system?
>
> Why not? If there are ways to flush/update the caches.

Then it needs use that flush/update for every operation to maintain
C++ memory model.
For some CPU it could be impossible to use C++ atomics.
But that is irrelevant, we only discuss CPU that can follow or have
greater garatees than C++ give.

How exactly this CPU works or how many cores or cache levels it has,
or if it is weak or strong memory model, all this is opaque for C++
abstract machine.
Only matter if implementation of `store` and `load` have exactly the
same behavior as required by standard.
It could be even implemented by halting other cores if there is no other way.

>
> > Whole point of the atomic was to bypass caches.
>
> Don't think this was a point.

if cache had old data then it was, how do you work with read-acquire
and write-release without flushing cache?

>
> > And different flavors of atomic operation are for reducing the size of invalidated cache.
>
> Like, relaxed flush L1 to L2 only, ack-rel — also L2 to L3, and seq-cst — L3 to RAM?

No, all operations in some cases could flush up to RAM, this only
depends on what is exactly needed in a given case.
Beside I mean how much cache lines are needed to flush/reload.
Relaxed need only need updated cache in where atomic variables live,
rest operation around it can use old cached values.
Acquire need for all other used cache lines to be fresh
Release need everything other written to be flush

But this is not only cache, there are too compiler operation reorders
and CPU operation reorders, they too need to match what C++ requires.

>
> > Simply by analogy, if we replace this load by increment. Even relaxed operations should be consistent.
>
> RMW operations have special properties: https://timsong-cpp.github.io/cppwp/n4861/atomics.order#10
> You can't just replace load with RMW, draw some conclusions, and replace RMW with load back leaving the conclusions.

```
void wait(T, memory_order = memory_order::seq_cst) const noexcept;
void store(T, memory_order = memory_order::seq_cst) const noexcept;
```

and https://timsong-cpp.github.io/cppwp/n4861/atomics.order#5 :

"""
[Note:
This definition ensures that S is consistent with the modification
order of any atomic object M.
It also ensures that a memory_order::seq_cst load A of M gets its
value either from the last modification of M that precedes A in S or
from some non-memory_order::seq_cst modification of M that does not
happen before any modification of M that precedes A in S.
— end note]
"""

For your case this behaves the same as RMV, it could miss some writes
that are not `seq_cst` but it is not our case there.

This means for the default case this is trivial. After backing back to
the first message there is a difference given in example, it uses
`relaxed`,
this is a more tricky case.

Probably this:
https://timsong-cpp.github.io/cppwp/n4861/intro.multithread#intro.races-18
And `notify` happens before `unblock`, this will make "happens before"
chain from relaxed `store` to `load`.
Question is if this first "happens before" can be assumed as `unblock`
is triggered directly by `notify`.

Received on 2022-07-20 22:45:09

std-discussion