On Mon, Oct 21, 2019 at 11:33 AM Michael Spencer <bigcheesegs@gmail.com> wrote:

On Mon, Oct 21, 2019 at 10:33 AM Andrey Semashev via Std-Discussion <std-discussion@lists.isocpp.org> wrote:
On 2019-10-21 18:36, Thiago Macieira via Std-Discussion wrote:
> On Monday, 21 October 2019 00:44:24 PDT Jeremy Ong via Std-Discussion wrote:
>> You have now done an atomic load 3 times and it is possible that stores or
>> loads to foo have occurred in between. The pattern for atomic usage is to
>> typically load once, perform a change, and possibly compare/swap in a loop
>> in a manner that prevents ABA bugs.
>
> If the compiler can inline those three function calls, it may also coalesce
> the three loads into one. AFAIK, no compiler currently does it, but it could.
> If it can't inline, then the regular rules of visibility apply and if the
> atomic pointer's own address is not accessible by other threads and the called
> functions, coalescing can happen again.
>
> But if it can change, then the code is actually optimal, since you do have tot
> reload it every time. Except that you have to write:
>
> foo.load()->do_one_thing();
> foo.load()->do_another_thing();
> foo.load()->do_a_final_thing();

I don't think the compiler can ever prove that the pointer is not
accessed by any other threads or signal handlers. Even in case of LTO,
the compiler can't be sure external threads spawned by OS or shared
libraries or signal handlers don't modify the atomic pointer.

The compiler is always* free to transform atomic accesses such that they are performed as if in isolation (as long as it doesn't break forward progress). There's no way for another thread to observe the difference between the compiler doing this transformation, and the hardware scheduling the threads and memory accesses such that this happens. This is allowed even if any of the do_* functions modify foo, as the compiler can just insert an atomic store at the end of the sequence (again, as long as it doesn't break forward progress).

I realized over lunch I should be a bit more precise here. The question I stated below is the correct high level way to look at things, and the statement here doesn't quite capture some of the other things that could make the transformation invalid. It's not enough to just not break forward progress, the compiler must also ensure that it doesn't break the order in which memory is observed by other threads. For example, if one of the do_* functions does an atomic SC (sequentially consistent) load of some other variable, it must reload foo due to SC rules. Otherwise the program could observe a state where another thread did an SC store to foo then an SC store to some other variable, and this thread sees the store to that other variable, but not the store to foo.

You can see this on cppmem (http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/) with the following example:

int main() {
atomic_int foo = 2;
atomic_int z = 0;
{{{ {foo.load().readsvalue(2); // Don't observe the write to foo.
z.load().readsvalue(1); // Observe the write to z.
foo.load().readsvalue(2);} // Reuse the previous load from foo.
||| {foo.store(3);
z.store(1);}
}}};
return 0;
}

Of the 480 possible executions, none of them are consistent, thus there is no execution that the compiler could pick that would reuse the first load of foo, assuming it can't control the value it reads for z.

- Michael Spencer

A good way to think about this is to ask the question: "Is this specific execution always valid according to the C++ memory model, regardless of what other threads do?" If so, then it's valid for the compiler to transform the code such that that execution does always happen.

- Michael Spencer