Hans, thanks for the great summary of our discussion.  In the spirit of "keeping the conversation going", I want to point out the following about implementation.

If an implementation protects all transactions with a single, global, reentrant lock, then there are three pieces to the implementation.  
- The first is modifying the parser to support the new syntax.  This is common to any implementation, so whatever effort it entails is really a fixed cost for the TS.  
- The second is some kind of run-time library that maps the beginning and end of a lexically scoped transaction to operations for lock acquire and lock release.  This is quite simple.
- The third is whatever static analysis is needed to warn programmers when their atomic blocks do something forbidden.  This may be difficult, depending on how we specify things.  For example, if the compiler *must* warn when an exception escapes a transaction, then if the compiler does not have access to definition of a function that might throw, how could it give such a warning?

My guess is that the third piece is going to be "best effort" by the compiler writer.  Surely some things should produce warnings, but with an understanding that there might be false negatives (e.g., we miss a warning for a transaction that has undefined behavior, because the transaction spans TUs and in some other TU, it uses std::cout or a std::atomic variable, or opens a socket, or does something else that can legally constitute inter-thread communication).

Turning to the second popular implementation, which uses Intel/ARM/IBM HTM, note that pieces 1 and 3 are unchanged.  As for piece 2, it is only slightly more complicated, still quite simple.

Now on to the third implementation, with STM.  Task 2 is much harder, because we have to instrument the reads and writes in the bodies of functions called from the transaction.  The key simplification here is that we do not support explicit self-abort.  While the user might assume that the implementation uses speculation to get better concurrency, there is no speculation or rollback in the proposed TS.  This means that even a sophisticated STM implementation can fall back to serializing on a single global lock.  For multi-TU transactions, such fallback is expected in the proposed TS, because we do not want viral annotations on functions, and we did not specify "transaction-safe" annotations or hints.

From a pragmatic perspective, this means any function in the C++ standard library can simply be declared "transaction safe", as long as there is a reasonable argument for doing so.  Since HTM and STM both are allowed to fall back to a lock at any time, the only thing we really need to do is ensure that a single lock implementation that allowed standard library function X would still be correct, and then it should be valid to say that standard library function X is safe to call from within transactions.

In the case of memory allocation, the argument is straightforward: making assumptions about the bit representation of the return value of operator new is not legal.  Correct C++ programs cannot use return values from operator new to infer the behavior of other threads, or to communicate with each other, because doing so is not portable, and the implementation of the allocator is not defined by the specification.  Therefore, as long as an STM transaction's call to operator new or operator delete can be "logically undone" if the run-time system aborts and restarts the transaction, there is no semantic issue.  At the implementation level, operator new might call sbrk() or mmap(), which could cause an HTM transaction to fall back to serializing, but that's not a correctness issue.

In the case of throw, the expression of exceptions in the intermediate representation is via function calls or intrinsics (cxa_throw, cxa_catch, LLVM landing pads, etc).  The implementation of these is dependent on the OS and architecture.  For example, IIRC on Linux targets they traverse DWARF tables within the binary itself.  So, again, making assumptions about these is not portable, and correct programs cannot assume anything more than the high-level specified behavior.  That behavior is little more than "nonlocal jump, with destruction of everything on the popped part of the run-time stack".  Note: if I am wrong, and the behavior of stack unwinding is more carefully defined, and that definition includes the possibility of stack unwinding being a legal form of inter-thread communication, then my whole argument is invalid.  Hopefully someone more knowledgeable about exceptions in C++ can confirm or refute my claim.

In the original TMTS, exceptions are tricky because there need to be writes to the exception object, and these writes need to be instrumented in a special way, because these writes can escape the transaction even if the transaction gets undone.  In the proposed TMTS, exceptions cannot escape a transaction (and the transaction doesn't ever have to get undone), which means that STM can fall back to a single lock whenever it encounters a throw.  This is easy to do, because the STM instrumentation pass simply says "here's a function for which I don't have a definition... I guess I'll just serialize the transaction here".  cxa_throw is no different than a multi-TU transaction.

If we *wanted* to specify commit-on-exception-escape, it would be trivial to support in a reasonable TM implementation.  I don't think we want to, and I'm in favor of forbidding exceptions from escaping transactions.  However, the use of exceptions within a transaction does not appear to pose an implementation challenge, nor does it appear to pose a correctness challenge.

So, in summary, I do not see implementation issues with supporting throw+catch inside a transaction, or allocation inside a transaction.  As the minutes state, this will open up a huge portion of the standard library to transactions (perhaps only threads, coroutines, mutexes, atomics, and iostreams would still be forbidden?).

- Mike

On Wed, Oct 21, 2020 at 12:28 AM Hans Boehm via SG5 <sg5@lists.isocpp.org> wrote:
Following up on our resolution to discuss this between meetings:

In the HTM case, since the hardware actually implements strong atomicity, for atomically accessible data, is it correct to implement

atomic do { r = data; }

as essentially 

r.load(data, memory_order_seq_cst);

?

That would very partially address the concern we had during the meeting, and maybe make Michael Scott happier. With some implementation effort simple transactional loads could be made cheap on x86 and maybe cheap enough on ARM. It would unfortunately require a less trivial implementation to do so.

Hans

On Tue, Oct 20, 2020 at 6:16 PM Hans Boehm <boehm@acm.org> wrote:
Attendees: Hans Boehm, Victor Luchangco, Jens Maurer (at the beginning), Michael Scott, Mike Spear, and Michael Wong

Initially discussed p2066r4 and vector::size(), and the question of whether it is and should be included in the set of guaranteed transaction-safe functions.

We agreed that the current definition of "accessors" is not clear enough. At this point, there also seemed to be close to consensus that we should try to initially keep things as simple as possible, and avoid anything that might look scary to implementors.

We then tried to switch back to the discussion of including new.

Mike Spear argued that adding new and supporting non-escaping exceptions doesn't really complicate simple implementations. Both single-global-lock and HTM implementations still work.

Mike Spear: What can you actually do with the p2066 proposal?

Michael Scott: implement lock-free data structures like linked lists that read without a lock, and then transactionally update it at the right place.

Mike Spear: No, because the initial reads can't be on atomic data.

Can't really use these transactions as an NCAS replacement. All read accesses preceding NCAS would also have to be transactional, which is too expensive for something like Intel HTM. Transactions cost as much as locks, so surrounding every read by one is not practical.

Victor L.: Transactional reads could work for tree traversal followed by update, which is probably more common.

Clarification that this avoids contention in an HTM context, but still has latency issues even in the uncontended case.

Discussion of double-checked locking, with a similar conclusion. It's not practical to put the initial read in a transaction.

We unfortunately concluded that we either need more of a usability argument for what we have, or probably a revision of both the design and wording documents to allow new + caught exceptions. If we had the latter, most of the STL data structures could become usable in transactions. Given that roll-your-own data structures don't seem terribly viable, that would give us a much better usability argument.

Closed with a strong recommendation to continue this discussion online, and not wait until the next meeting.

Next meeting:

Nov. 17, 8:00 PST, 11:00 EST
Join with Google Meet
meet.google.com/sbj-cvgh-vnd
Join by phone
‪(US) +1 208-925-0196‬ PIN: ‪255 542‬#



--
SG5 mailing list
SG5@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg5