C++ Logo

std-discussion

Advanced search

Atomic notify may get missed by atomic wait

From: zwhconst <zwhconst_at_[hidden]>
Date: Fri, 22 Jul 2022 01:26:49 +0800
As discussed in another thread
(https://lists.isocpp.org/std-discussion/2022/07/1683.php),
in the following code
```
std::atomic<bool> b{false};

void thr1() {
     b.store(true, std::memory_order_relaxed);
     b.notify_one();
}

int main() {
     std::thread t(thr1);
     b.wait(false);
     t.join();
     return 0;
}
```
the waiting thread might get the notification but (arguably) miss the new
value and block forever. But the topic of this email is not about that
issue.
I'm here arguing that by the current standard wording, the waiting thread
might miss the notification in the first place.

Consider the following code:

```
std::atomic<bool> b{false};

void thr1() {
     b.wait(false);//atomic waiting operation W1
}
void thr2() {
     b.wait(false);//atomic waiting operation W2
}

int main() {
     std::thread t1(thr1);
     std::thread t2(thr2);
     b.store(true, std::memory_order_relaxed);
     b.notify_one();//atomic notifying operation N1
     b.notify_one();//atomic notifying operation N2
     t1.join();
     t2.join();
     return 0;
}
```

Is the above program guaranteed to terminate?
The standard's intention is that it is, right?

First let me paste the relevant standard text here:

"eligible to be unblocked by":
> A call to an atomic waiting operation on an atomic object M is
> eligible to be unblocked by a call to an atomic notifying operation on M
> if there exist side effects X and Y on M such that:
> (4.1) — the atomic waiting operation has blocked after observing the
> result of X,
> (4.2) — X precedes Y in the modification order of M, and
> (4.3) — Y happens before the call to the atomic notifying operation.

atomic::wait():
> void wait(T old, memory_order order) const noexcept;
> Effects: Repeatedly performs the following steps, in order:
> (30.1) — Evaluates load(order) and compares its value representation for
> equality against that of old.
> (30.2) — If they compare unequal, returns.
> (30.3) — Blocks until it is unblocked by an atomic notifying operation or
> is unblocked spuriously.

atomic::notify_one():
> void notify_one() noexcept;
> Effects: Unblocks the execution of at least one atomic waiting operation
> that is eligible to be unblocked (31.6) by this call, if any
such
> atomic waiting operations exist.

Let's assume the program executes as following:
Atomic waiting operations W1 and W2 first load `false` values and block,
then
N1 unblocks W1.

Is W1 eligible to be unblocked by N1?
It seems that W1 meets all the three requirements (4.1, 4.2 and 4.3). So
yes.

Is W2 eligible to be unblocked by N2?
Also yes.

Is W1 eligible to be unblocked by N2?
It depends on what "has blocked" mean in clause 4.1.

If it means "once ever blocked", then yes, W1 indeed is eligible to be
unblocked by N2 because W1 meets all the three requirements. For N2, W1 and
W2 are both eligible to be unblocked. Now N2 should "unblock the
execution of
at least one atomic waiting operation that is eligible to be unblocked" by
itself. The implementation is permitted to unblock only W1, again. And W2
gets unblocked by nobody, the program hangs forever. I think this is
ridiculous. This cannot be the intended meaning.

If it means "is blocking", it makes the "eligible to be unblocked by"
relation
a time-constrained relation -- an atomic waiting operation is eligible to be
unblocked, only when it's in step 30.3.
With this meaning, let's re-think the whole program. When N1 executes, if W1
and W2 are running to the point after 30.2, and before 30.3, they are not
eligible to be unblocked. N1 sees that no qualified atomic waiting operation
exists, unblocks nobody. N2 may also unblock nobody. Now the program hangs
forever.

So it seems to me that the current wording does not guarantee the
termination
of the above code, so as the original code in the other mailing thread.

Received on 2022-07-21 17:26:53