sg5: Re: [SG5] [EXTERNAL] Fwd: TM-lite proposal

From: Hans Boehm <boehm_at_[hidden]>
Date: Mon, 6 Jan 2020 16:57:10 -0800

Apologies for the very late follow-up here.

Thanks for the responses!

I'm not sure we communicated the use of constexpr here properly. At least
my view is the problems we are trying to solve here are:

1) Assuming we don't have a mechanism for explicitly informing the compiler
that certain functions are callable from transactions, nontrivial software
implementations will only work if the compiler can see all the code in a
transaction. In a purely HTM environment, or with the trivial global lock
implementation, that isn't necessary. But we don't want to preclude
software implementations. I expect that precluding software implementations
would also provoke opposition from some vendors. We conjecture that the
vast majority of constexpr functions will have that property, since
otherwise they couldn't be executed at compile time.

2) We conjecture that most constexpr functions will “do only ordinary
(non-atomic) run-time memory operations.” (I don't understand the comment
about compile-time IO; it seems to me that needs wildly different
semantics, since the IO device may literally not exist yet at compile time.
Can you clarify?) There will be exceptions.

3) We don't want to have to specify transaction-safety for every single
standard library function. But in general, whether a standard library
function is defined in a header file or a separate translation unit is an
implementation property. We're looking for a solution to get us out of
maybe 95% of that business.

4) We realize that constexpr is somewhat of a specification hack here. It's
not semantically the right property. We're hoping it's close enough to
provide a reasonable default. The text will say "unless otherwise
specified", and we fully expect there will be cases in which we will have
to explicitly specify. We're more than open to better specification hacks.
But I think we need to deal with (1).

I agree that in many cases, it would be really nice if we could detect
conflicts at a higher level. That seems like it's still a research issue?
It also seems to require at least the kind of hairy nested transaction
support that we've been trying to get away from? x++ commutes with x++, but
to take advantage of that inside concurrent transactions, they individually
need to run transactionally as well, right?

On Tue, Nov 26, 2019 at 12:54 PM Herb Sutter <hsutter_at_[hidden]> wrote:

> Thanks for the update, Hans and everyone.
>
>
>
> It looks like a promising direction and I’m glad to see it’s avoiding
> mission/feature creep.
>
>
>
> Like Tim says, I think library-based prototypes are useful.
> Later/eventually if we actually standardize something in this area we will
> likely want atomic { } blocks, which naturally avoid some of the issues
> with the others (such as making it easy to require stack-based
> transactions). But no need to snap to that now if it’s not needed for
> experimentation and validation.
>
>
>
> Tim makes excellent points about conflicts and scalability. Though if the
> tree examples are balanced rb/avl trees as it sounds like, IIUC the
> conflicts may be actually real.
>
>
>
>
>
> Re transaction safety of libraries: I think the standard library shouldn’t
> be a special case, but there should be a simple general answer to whether
> users can use arbitrary libraries’ features (including std::, but also
> Boost, Qt, Poco, etc.) inside transactions. I think users are going to
> assume reasonable all-or-nothing behavior without knowing anything more
> than whether the callee touches only memory using ordinary (non-atomic<>)
> operations – any call tree they know does nothing but touching memory ought
> to Just Work, right? That would be my expectation as a user, and it’s
> something I can teach. If there’s something missing with that please let me
> know. But if so, then rather than documenting what things are
> transaction-safe, wouldn’t it be a more general matter of just
> knowing/documenting/annotating what functions in a library (including
> std::) are permitted to have side effects other than ordinary memory
> operations?
>
>
>
> Constexpr (and, even more strongly, consteval) seems like the wrong place
> to draw a line between transactional/not-transactional. Constexpr is not
> directly related to memory operations which is the key thing. Plus we’re on
> a path where eventually nearly everything will be constexpr, including I/O
> and printf, and we already have multiple implementation examples, including:
>
> - Andrew Sutton’s and Wyatt Childers’ implementation of P0707 and the
> reflection proposals has compiler.print, which is fully consteval but
> couldn’t be in any TM transaction.
> - Sean Baxter’s Circle compiler <https://www.circle-lang.org/> already
> does compile-time invocation of real printf (it’s the compiler’s very first
> “I was successfully installed” sanity check), which again is consteval and
> not TM-able.
>
>
>
> ISTM the clear line to define “function is transactional” should be “does
> only ordinary (non-atomic) run-time memory operations.” (We can then
> someday later have a separate but surely fun-filled discussion about
> whether/why/how consteval concurrency/std::thread/TM.)
>
>
>
> I haven’t thought about how this interacts with coroutines.
>
>
>
> Thanks for the note!
>
>
>
> Herb
>
>
>
>
>
>
>
> *From:* Tim Sweeney <tim.sweeney_at_[hidden]>
> *Sent:* Monday, November 25, 2019 7:46 PM
> *To:* boehm_at_[hidden]
> *Cc:* sg5_at_[hidden]; Herb Sutter <hsutter_at_[hidden]>
> *Subject:* [EXTERNAL] Fwd: TM-lite proposal
>
>
>
> Hi Hans,
>
>
>
> I like std::tm_exec as a simple solution that can try hardware (if
> available) and fall back to software (initially with just a global local),
> leaving room for future software fallback improvements. Given the
> uncertainty about hardware and even the viability of transactions using
> current libraries, we should avoid anything more complicated or elaborate
> than std::tm_exec.
>
>
>
> We'll see continued investment in speculative execution issues; CPU
> architects feel it's their job to solve hard problems of maximizing code
> throughput and all I've talked to seem undaunted. I've been advocating for
> speculative execution hiding the latency of atomics and TSX by assuming
> uncontended success and speculatively executing ahead, backtracking if that
> assumption proves wrong. If we had that, we could even ask future CPUs to
> guarantee strong memory ordering, and use speculation to hide the cost in
> the uncontended case...
>
>
>
> However...
>
>
>
> 1) Current TSX works but it's not 100% clear that it's worth continuing to
> make available on future CPUs. The throughput cost (~60 clocks) makes it
> expensive for simple DCAS style usage, and being bound by cache footprint
> prevents it from being useful for large transactions. Thus what's left is
> the middle, which is a benefit in some specialized database applications,
> but we won't see OS kernels redesigned any time soon.
>
>
>
> 2) Expanding TSX to be unbounded by cache size has been investigated and
> is possible, but it's not clear that actual applications will be able to
> utilize it due to the rate of transaction conflicts, both real and false.
>
>
>
> 3) The problem of false conflicts is significant in real-world code. For
> example, consider an ordered list data structure implemented as a tree.
> Logically, if two transactions each add a distinct value to the ordered
> list, that shouldn't cause read-write dependency. But physically, many
> implementations would update parent nodes, causing false dependencies that
> force the transactions to be serialized.
>
>
>
> The problem of false conflicts isn't specific to TSX or even to hardware
> transactions, but would be a flaw in any C++ language-level solution that
> tracked read-write dependencies in memory. Ultimately, these read-write
> tracking implementations say that computations A and B can run in parallel
> if they don't have any physical write-read or write-write dependencies. But
> what we really want to ask is whether the high-level observable state of
> the program is commutative: if it's the same if we run A then B as if we
> had run B then A. For a specific object, consider a game that handles
> collision detection using an octree. The fact that a monster is moving in
> my backyard shouldn't affect collision detection in my kitchen.
>
>
>
> Various data structures would want different ways of determining
> commutativity that are less conflict-prone than low-level read-write
> dependencies, but the design space has barely been explored. For example,
> in the monster case, we want to be able to run a lot of read-write
> move_monster() operation in parallel with a bunch of read-only
> check_collision() operations, and as they retire, ensure that the
> return-value result of the check_collision() operations were unchanged by
> the move_monster() operations, instead of asking whether they read and
> wrote the same memory. Most games actually use a version of this idea to
> hide network latency, by speculatively predicting local player movement,
> and correcting and rerunning if wrong.
>
>
>
> Because of TSX being limited to tracking read-write dependencies on cache
> lines, and any broader solution being extremely complex, I think we'll only
> see TSX used for small transactions. My advice to CPU architects is to
> treat TSX as a fancy way of enabling DCAS supporting a bit of logic and
> control flow. You'd use it only for small, low-level, library-like
> operations like adding an element to a double-linked list in a thread-safe
> way.
>
>
>
> *Conjecture*
>
>
>
> I believe that bigger transactions will rely on a software implementation,
> which uses new transaction-friendly libraries replacing the std container
> classes, and gives programmers the ability to implement data structures
> that specify, in code, whether transactions commute. Ideally such a library
> would provide:
>
>
>
> - container data types (notionally immutable)
>
> - future<t>
>
> - var<t> for atomic and transaction-safe mutable data
>
> - transactions based on isolated speculative execution of code that can be
> committed or aborted
>
> - an effects system for tracking read/write/exception/io/user
> dependencies, sequencing, and commutativity
>
> - concurrent garbage collection
>
>
>
> I've been experimenting with this approach in my spare time for the past
> few years. The constraints each aspect composes turn out to be fairly
> complementary, but the overhead is daunting. If we could scale Fortnite
> from 100 players in a map to 1000 players, we'd be happy to pay a 4x
> overhead and run the simulation on a 40-core server. But I'm pretty far
> from "just 4x".
>
>
>
> -Tim
>
>
>
> On Mon, Nov 25, 2019 at 8:09 PM Hans Boehm <boehm_at_[hidden]> wrote:
>
> Tim, Herb -
>
>
>
> As you may have seen, SG5 generated an initial proposal for a simplified
> transactional memory facility. See wg21.link/p1875. This was reviewed by
> SG1. See http://wiki.edg.com/bin/view/Wg21belfast/P1875r0
> <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.edg.com%2Fbin%2Fview%2FWg21belfast%2FP1875r0&data=02%7C01%7Chsutter%40microsoft.com%7C1ab6053bc2c84d3f3d0d08d772232d48%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637103367726062298&sdata=DN95Zkq6NZR1sQB2v2bOqkpV6Qqajf0%2F9R0Fhteq1a0%3D&reserved=0>,
> or my short summary below. We'd be very interested in any additional
> reactions you may have.
>
>
>
> One additional concern, brought up in later SG5 discussions, is that we're
> not terribly clear on the perception/status of various hardware
> transactional memory (HTM) facilities, in light of the fact that current
> best effort HTM facilities seem ideally suited for side-channel attacks
> that need to inspect cache state. Given that this facility is becoming more
> focused on exposing hardware support, this seems relevant. None of us have
> a good idea where hardware vendors are going with this.
>
>
>
> Thanks!
>
>
>
> Hans
>
>
>
> My summary of SG1 reaction to proposal [with some comments from subsequent
> SG5 discussion] :
>
> -------------------------------------------------------
>
>
>
> Generally people seemed positively inclined. People liked the idea of
> cutting back on the original TS. There weren't really any requests to put
> specific pieces back.
>
> Many people would like a mechanism that allows relatively low-level
> portable access to HTM facilities. That generally seemed to be the highest
> priority, and is consistent with what we've heard before.
>
> Along the same lines, there was apprehension about not being able to get
> feedback as to whether a transaction was actually executing in HTM. Several
> people wanted a way to explicitly observe retries. I think especially the
> HPC community needs something like that to diagnose the performance
> problems that otherwise arise from a single global lock fallback.
>
> There was concern that the RAII-based syntax allowed semi-nonsensical
> constructs that would be hard to handle in the implementation. What happens
> if the transaction_scope object is allocated somewhere other than on the
> stack? How would you implement that? Even in cases in which it is owned by
> a stack object, and thus does kind-of make sense, it seems problematic.
>
> We did end up taking a quick poll about the two other options in SG1, and
> it was 8 to 3 in favor of language syntax, with a significant number of
> people not voting. I have little confidence that this would go the same way
> in other groups. I may ask around to get some impression of how this might
> go in EWG, which actually has more of a voice here. [ SG5 is happy with
> going the syntax route, justified by varargs issues with the lambda
> approach, and SG1 concerns about the RAII approach. ]
>
> There was some interesting discussion of co_await in transactions, and
> "coroutines" in general. (C++ "coroutines" are very customizable, and may
> not behave much like actual coroutines. So this question is probably a bit
> more important and subtle than it seems at first glance.) We need to think
> about this when drafting wording. I currently don't have good intuitions
> about this. [ SG5 consensus afterwards was to initially not require support
> for anything coroutine-related in transactions. The current plan is to
> allow implementations to support anything they want in transactions anyway.
> ]
>
> The abuse of constexpr to define a minimal set of transaction-safe library
> facilities was viewed with suspicion. It was pointed out that we would
> probably need to list some exceptions in both directions. (Which I think is
> OK, since I view this as an opportunistic specification mechanism to avoid
> specifying transaction-safety for every standard library routine.) I did
> not hear an actual better alternative.
>
>

Received on 2020-01-06 18:59:52