Date: Tue, 3 Jan 2023 11:24:05 +1100
Hello all,
While working on a C++ coroutine library for use at Google I've found that some
important functionality is hindered by a precondition on
`std::coroutine_handle<Promise>::from_promise` that seems to be very slightly
stronger than necessary, and I'm seeking feedback on a proposal to weaken it
accordingly. I think this can be done while keeping existing implementations
conforming.
[coroutine.handle.con] currently lists the following precondition for
`from_promise(Promise& p)`:
> Preconditions: `p` is a reference to a promise object of a coroutine.
I propose changing it to something like the following (feedback on wording
welcome):
> Preconditions: p is a reference to an object that is
> pointer-interconvertible with the promise object of a coroutine, and
> has the same alignment as the promise.
While working on a C++ coroutine library for use at Google I've found that some
important functionality is hindered by a precondition on
`std::coroutine_handle<Promise>::from_promise` that seems to be very slightly
stronger than necessary, and I'm seeking feedback on a proposal to weaken it
accordingly. I think this can be done while keeping existing implementations
conforming.
[coroutine.handle.con] currently lists the following precondition for
`from_promise(Promise& p)`:
> Preconditions: `p` is a reference to a promise object of a coroutine.
I propose changing it to something like the following (feedback on wording
welcome):
> Preconditions: p is a reference to an object that is
> pointer-interconvertible with the promise object of a coroutine, and
> has the same alignment as the promise.
--- The motivation for this is to allow iterating over a call chain of suspended coroutines as a linked list, despite the fact that the coroutines in the call chain may have different promise types. Promise types naturally form a singly-linked list via coroutine handles to resume when a coroutine completes: struct NaivePromise { [...] // The coroutine to resume when my coroutine finishes. std::coroutine_handle<NaivePromise> waiter; }; But this doesn't work if there are multiple promise types involved, which is very common since a promise is likely to be templated on the coroutine's result type: template <typename Result> struct MoreRealisticPromise { [...] // The value co_returned by the coroutine. std::optional<Result> result; // The coroutine to resume when my coroutine finishes. std::coroutine_handle<???> waiter; }; In the example above we could use `std::coroutine_handle<>` for the waiter, but then we could no longer use it as a link in a list because there would be no UB-free way to get hold of the next node given a current node. Ideally we would be able to do this instead: struct PromiseBase { // The coroutine to resume when my coroutine finishes. std::coroutine_handle<PromiseBase> waiter; }; template <typename Result> struct Promise : PromiseBase { [...] // The value co_returned by the coroutine. std::optional<Result> result; }; static_assert(std::is_pointer_interconvertible_base_of_v< PromiseBase, Promise<int>>); static_assert(alignof(PromiseBase) == alignof(Promise<int>)); This retains the ability to both resume the waiter and to iterate over the chain of promises. Except it's not possible to do in a standard-compliant way as far as I can tell, because due to the precondition on `from_promise` there seems to be no UB-free wait to obtain a `std::coroutine_handle<PromiseBase>`. A reference to the `PromiseBase` sub-object of `Promise<int>` is not technically a reference to a coroutine's promise object. --- For example, consider this case where Foo co_awaits the result of a call Bar, which co_awaits the result of a call to Baz: Task<int32_t> Baz() { [...] co_return 0; } Task<int64_t> Bar() { co_return (co_await Baz()) + 1; } Task<void> Foo() { assert(co_await Bar() == 1); } While Baz is suspended, it is useful to be able iterate in reverse over the co_await chain Baz -> Bar -> Foo. I've found the need for this in two contexts so far: 1. To offer good async backtraces. For example if Baz crashes, then with a naive unwinding implementation the backtrace in the logs would show only the functions on the physical thread call stack. But that reveals only what most recently resumed Baz: Baz() WakeCoroutine() SomeRpcFinished() ABunchOfNetworkCode() [...] start_thread This is useful, but another useful aspect is what the logical call/await chain is among the coroutines: Baz() Bar() Foo() [...] See [1] for a similar effort in Rust. 2. To clean up after cancellation of a coroutine call/await chain without stack overflow, by destroying the callee's frame before the caller's frame. In my environment cancellation is expressed by synchronizing on being suspended and then destroying the coroutine frame without resuming from co_await/co_yield. This can be done by arranging for the task destructor to call `std::coroutine_handle<>::destroy` and then destroying the "root" coroutine, letting destructors take care of the rest. Except this may cause a stack overflow if there are many coroutines in the chain, since the compiler can't necessarily use a tail call in the task's destructor. This is out of character with everything else in the coroutine environment, which can typically be made to be use only O(1) stack frames by leaning on symmetric transfer of control. An alternative that avoids stack overflow is to have the cancellation process itself destroy the coroutine frames, from the leaf to the root, so that callee if correctly cleaned up before caller. But this requires the ability for the cancellation process to iterate over the coroutine handles: for (std::coroutine_handle<PromiseBase> h = leaf; h;) { const std::coroutine_handle<PromiseBase> next = h.promise().waiter; h.destroy(); h = next; } --- I'm not an expert in various compilers' implementations of coroutines, but from my limited understanding I think at least the major ones would all already be in compliance with the proposed more relaxed precondition. For example, clang's ABI is that the coroutine frame starts with two function pointers, followed by an integer index, followed by the promise object at an appropriate alignment. [2] [3] The libc++ implementation of `from_promise` is to use a clang intrinsic for getting base address of the coroutine frame that passes the address of the promise and its alignment [4], which by definition are identical for an object that is pointer-interconvertible and has the same alignment. The intrinsic's implementation [5] would have no trouble with this as far as I can tell, and in practice using the code above does work reliably in clang despite technically being UB. The story for libstdc++ and gcc seems to be identical [6] [7]. I don't have access to the MSVC source, but their STL also uses an intrinsic that simply accepts the address of the promise [8]. In this case there isn't even an alignment provided. (I don't see how it can be correct to leave out an alignment; perhaps it's a bug and over-aligned promises are not correctly supported.) --- Does anybody have any feedback on this, perhaps some implementation difficulty I've missed? Thanks, Aaron [1]: https://rust-lang.github.io/wg-async/design_docs/async_stack_traces.html [2]: https://www.llvm.org/docs/Coroutines.html [3]: https://devblogs.microsoft.com/oldnewthing/20211007-00/?p=105777 [4]: https://github.com/llvm/llvm-project/blob/9cf4419e/libcxx/include/__coroutine/coroutine_handle.h#L116-L123 [5]: https://github.com/llvm/llvm-project/blob/9cf4419e/llvm/lib/Transforms/Coroutines/CoroEarly.cpp#L54-L80 [6]: https://github.com/gcc-mirror/gcc/blob/ec1db901/libstdc%2B%2B-v3/include/std/coroutine#L199-L206 [7]: https://github.com/gcc-mirror/gcc/blob/ec1db901/gcc/coroutine-passes.cc#L88-L125 [8]: https://github.com/microsoft/STL/blob/8ddf4da2/stl/inc/coroutine#L101-L107
Received on 2023-01-03 00:24:33