sg5: Re: [SG5] Oct 20 minutes

From: Michael Spear <mfs409_at_[hidden]>
Date: Wed, 21 Oct 2020 09:54:38 -0400

Hans, thanks for the great summary of our discussion. In the spirit of
"keeping the conversation going", I want to point out the following about
implementation.

If an implementation protects all transactions with a single, global,
reentrant lock, then there are three pieces to the implementation.
- The first is modifying the parser to support the new syntax. This is
common to any implementation, so whatever effort it entails is really a
fixed cost for the TS.
- The second is some kind of run-time library that maps the beginning and
end of a lexically scoped transaction to operations for lock acquire and
lock release. This is quite simple.
- The third is whatever static analysis is needed to warn programmers when
their atomic blocks do something forbidden. This may be difficult,
depending on how we specify things. For example, if the compiler *must*
warn when an exception escapes a transaction, then if the compiler does not
have access to definition of a function that might throw, how could it give
such a warning?

My guess is that the third piece is going to be "best effort" by the
compiler writer. Surely some things should produce warnings, but with an
understanding that there might be false negatives (e.g., we miss a warning
for a transaction that has undefined behavior, because the transaction
spans TUs and in some other TU, it uses std::cout or a std::atomic
variable, or opens a socket, or does something else that can legally
constitute inter-thread communication).

Turning to the second popular implementation, which uses Intel/ARM/IBM HTM,
note that pieces 1 and 3 are unchanged. As for piece 2, it is only
slightly more complicated, still quite simple.

Now on to the third implementation, with STM. Task 2 is much harder,
because we have to instrument the reads and writes in the bodies of
functions called from the transaction. The key simplification here is that
we do not support explicit self-abort. While the user might assume that
the implementation uses speculation to get better concurrency, there is no
speculation or rollback in the proposed TS. This means that even a
sophisticated STM implementation can fall back to serializing on a single
global lock. For multi-TU transactions, such fallback is expected in the
proposed TS, because we do not want viral annotations on functions, and we
did not specify "transaction-safe" annotations or hints.

>From a pragmatic perspective, this means any function in the C++ standard
library can simply be declared "transaction safe", as long as there is a
reasonable argument for doing so. Since HTM and STM both are allowed to
fall back to a lock at any time, the only thing we really need to do is
ensure that a single lock implementation that allowed standard library
function X would still be correct, and then it should be valid to say that
standard library function X is safe to call from within transactions.

In the case of memory allocation, the argument is straightforward: making
assumptions about the bit representation of the return value of operator
new is not legal. Correct C++ programs cannot use return values from
operator new to infer the behavior of other threads, or to communicate with
each other, because doing so is not portable, and the implementation of the
allocator is not defined by the specification. Therefore, as long as an
STM transaction's call to operator new or operator delete can be "logically
undone" if the run-time system aborts and restarts the transaction, there
is no semantic issue. At the implementation level, operator new might call
sbrk() or mmap(), which could cause an HTM transaction to fall back to
serializing, but that's not a correctness issue.

In the case of throw, the expression of exceptions in the intermediate
representation is via function calls or intrinsics (cxa_throw, cxa_catch,
LLVM landing pads, etc). The implementation of these is dependent on the
OS and architecture. For example, IIRC on Linux targets they traverse
DWARF tables within the binary itself. So, again, making assumptions about
these is not portable, and correct programs cannot assume anything more
than the high-level specified behavior. That behavior is little more than
"nonlocal jump, with destruction of everything on the popped part of the
run-time stack". Note: if I am wrong, and the behavior of stack unwinding
is more carefully defined, and that definition includes the possibility of
stack unwinding being a legal form of inter-thread communication, then my
whole argument is invalid. Hopefully someone more knowledgeable about
exceptions in C++ can confirm or refute my claim.

In the original TMTS, exceptions are tricky because there need to be writes
to the exception object, and these writes need to be instrumented in a
special way, because these writes can escape the transaction even if the
transaction gets undone. In the proposed TMTS, exceptions cannot escape a
transaction (and the transaction doesn't ever have to get undone), which
means that STM can fall back to a single lock whenever it encounters a
throw. This is easy to do, because the STM instrumentation pass simply
says "here's a function for which I don't have a definition... I guess I'll
just serialize the transaction here". cxa_throw is no different than a
multi-TU transaction.

If we *wanted* to specify commit-on-exception-escape, it would be trivial
to support in a reasonable TM implementation. I don't think we want to,
and I'm in favor of forbidding exceptions from escaping transactions.
However, the use of exceptions within a transaction does not appear to pose
an implementation challenge, nor does it appear to pose a correctness
challenge.

So, in summary, I do not see implementation issues with supporting
throw+catch inside a transaction, or allocation inside a transaction. As
the minutes state, this will open up a huge portion of the standard library
to transactions (perhaps only threads, coroutines, mutexes, atomics, and
iostreams would still be forbidden?).

- Mike

On Wed, Oct 21, 2020 at 12:28 AM Hans Boehm via SG5 <sg5_at_[hidden]>
wrote:

> Following up on our resolution to discuss this between meetings:
>
> In the HTM case, since the hardware actually implements strong atomicity,
> for atomically accessible data, is it correct to implement
>
> atomic do { r = data; }
>
> as essentially
>
> r.load(data, memory_order_seq_cst);
>
> ?
>
> That would very partially address the concern we had during the
> meeting, and maybe make Michael Scott happier. With some implementation
> effort simple transactional loads could be made cheap on x86 and maybe
> cheap enough on ARM. It would unfortunately require a less trivial
> implementation to do so.
>
> Hans
>
> On Tue, Oct 20, 2020 at 6:16 PM Hans Boehm <boehm_at_[hidden]> wrote:
>
>> Attendees: Hans Boehm, Victor Luchangco, Jens Maurer (at the beginning),
>> Michael Scott, Mike Spear, and Michael Wong
>>
>> Initially discussed p2066r4 and vector::size(), and the question of
>> whether it is and should be included in the set of guaranteed
>> transaction-safe functions.
>>
>> We agreed that the current definition of "accessors" is not clear enough.
>> At this point, there also seemed to be close to consensus that we should
>> try to initially keep things as simple as possible, and avoid anything that
>> might look scary to implementors.
>>
>> We then tried to switch back to the discussion of including new.
>>
>> Mike Spear argued that adding new and supporting non-escaping exceptions
>> doesn't really complicate simple implementations. Both single-global-lock
>> and HTM implementations still work.
>>
>> Mike Spear: What can you actually do with the p2066 proposal?
>>
>> Michael Scott: implement lock-free data structures like linked lists that
>> read without a lock, and then transactionally update it at the right place.
>>
>> Mike Spear: No, because the initial reads can't be on atomic data.
>>
>> Can't really use these transactions as an NCAS replacement. All read
>> accesses preceding NCAS would also have to be transactional, which is too
>> expensive for something like Intel HTM. Transactions cost as much as locks,
>> so surrounding every read by one is not practical.
>>
>> Victor L.: Transactional reads could work for tree traversal followed by
>> update, which is probably more common.
>>
>> Clarification that this avoids contention in an HTM context, but still
>> has latency issues even in the uncontended case.
>>
>> Discussion of double-checked locking, with a similar conclusion. It's not
>> practical to put the initial read in a transaction.
>>
>> We unfortunately concluded that we either need more of a usability
>> argument for what we have, or probably a revision of both the design and
>> wording documents to allow new + caught exceptions. If we had the latter,
>> most of the STL data structures could become usable in transactions. Given
>> that roll-your-own data structures don't seem terribly viable, that would
>> give us a much better usability argument.
>>
>> Closed with a strong recommendation to continue this discussion online,
>> and not wait until the next meeting.
>>
>> Next meeting:
>>
>> Nov. 17, 8:00 PST, 11:00 EST
>> Join with Google Meet
>> meet.google.com/sbj-cvgh-vnd
>> Join by phone
>> ‪(US) +1 208-925-0196‬ PIN: ‪255 542‬#
>>
>>
>>
>> --
> SG5 mailing list
> SG5_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg5
>

Received on 2020-10-21 08:54:52