C++ Logo

std-proposals

Advanced search

[std-proposals] Allowing coroutine_handle::from_promise to accept a pointer-interconvertible object

From: Aaron Jacobs <jacobsa_at_[hidden]>
Date: Tue, 3 Jan 2023 11:24:05 +1100
Hello all,

While working on a C++ coroutine library for use at Google I've found that some
important functionality is hindered by a precondition on
`std::coroutine_handle<Promise>::from_promise` that seems to be very slightly
stronger than necessary, and I'm seeking feedback on a proposal to weaken it
accordingly. I think this can be done while keeping existing implementations
conforming.

[coroutine.handle.con] currently lists the following precondition for
`from_promise(Promise& p)`:

> Preconditions: `p` is a reference to a promise object of a coroutine.

I propose changing it to something like the following (feedback on wording
welcome):

> Preconditions: p is a reference to an object that is
> pointer-interconvertible with the promise object of a coroutine, and
> has the same alignment as the promise.

---
The motivation for this is to allow iterating over a call chain of suspended
coroutines as a linked list, despite the fact that the coroutines in the call
chain may have different promise types. Promise types naturally form a
singly-linked list via coroutine handles to resume when a coroutine completes:
    struct NaivePromise {
      [...]
      // The coroutine to resume when my coroutine finishes.
      std::coroutine_handle<NaivePromise> waiter;
    };
But this doesn't work if there are multiple promise types involved, which is
very common since a promise is likely to be templated on the coroutine's result
type:
    template <typename Result>
    struct MoreRealisticPromise {
      [...]
      // The value co_returned by the coroutine.
      std::optional<Result> result;
      // The coroutine to resume when my coroutine finishes.
      std::coroutine_handle<???> waiter;
    };
In the example above we could use `std::coroutine_handle<>` for the waiter, but
then we could no longer use it as a link in a list because there would be no
UB-free way to get hold of the next node given a current node. Ideally we would
be able to do this instead:
    struct PromiseBase {
      // The coroutine to resume when my coroutine finishes.
      std::coroutine_handle<PromiseBase> waiter;
    };
    template <typename Result>
    struct Promise : PromiseBase {
      [...]
      // The value co_returned by the coroutine.
      std::optional<Result> result;
    };
    static_assert(std::is_pointer_interconvertible_base_of_v<
                      PromiseBase, Promise<int>>);
    static_assert(alignof(PromiseBase) == alignof(Promise<int>));
This retains the ability to both resume the waiter and to iterate over the
chain of promises. Except it's not possible to do in a standard-compliant way
as far as I can tell, because due to the precondition on `from_promise` there
seems to be no UB-free wait to obtain a `std::coroutine_handle<PromiseBase>`. A
reference to the `PromiseBase` sub-object of `Promise<int>` is not technically
a reference to a coroutine's promise object.
---
For example, consider this case where Foo co_awaits the result of a call Bar,
which co_awaits the result of a call to Baz:
    Task<int32_t> Baz() {
      [...]
      co_return 0;
    }
    Task<int64_t> Bar() {
      co_return (co_await Baz()) + 1;
    }
    Task<void> Foo() {
      assert(co_await Bar() == 1);
    }
While Baz is suspended, it is useful to be able iterate in reverse over the
co_await chain Baz -> Bar -> Foo. I've found the need for this in two contexts
so far:
1.  To offer good async backtraces. For example if Baz crashes, then with a
    naive unwinding implementation the backtrace in the logs would show only the
    functions on the physical thread call stack. But that reveals only what most
    recently resumed Baz:
        Baz()
        WakeCoroutine()
        SomeRpcFinished()
        ABunchOfNetworkCode()
        [...]
        start_thread
    This is useful, but another useful aspect is what the logical call/await
    chain is among the coroutines:
        Baz()
        Bar()
        Foo()
        [...]
    See [1] for a similar effort in Rust.
2.  To clean up after cancellation of a coroutine call/await chain without
    stack overflow, by destroying the callee's frame before the caller's frame.
    In my environment cancellation is expressed by synchronizing on being
    suspended and then destroying the coroutine frame without resuming from
    co_await/co_yield. This can be done by arranging for the task destructor to
    call `std::coroutine_handle<>::destroy` and then destroying the "root"
    coroutine, letting destructors take care of the rest. Except this may cause
    a stack overflow if there are many coroutines in the chain, since the
    compiler can't necessarily use a tail call in the task's destructor. This
    is out of character with everything else in the coroutine environment,
    which can typically be made to be use only O(1) stack frames by leaning on
    symmetric transfer of control.
    An alternative that avoids stack overflow is to have the cancellation
    process itself destroy the coroutine frames, from the leaf to the root, so
    that callee if correctly cleaned up before caller. But this requires the
    ability for the cancellation process to iterate over the coroutine handles:
        for (std::coroutine_handle<PromiseBase> h = leaf; h;) {
          const std::coroutine_handle<PromiseBase> next =
              h.promise().waiter;
          h.destroy();
          h = next;
        }
---
I'm not an expert in various compilers' implementations of coroutines, but from
my limited understanding I think at least the major ones would all already be
in compliance with the proposed more relaxed precondition.
For example, clang's ABI is that the coroutine frame starts with two function
pointers, followed by an integer index, followed by the promise object at an
appropriate alignment. [2] [3] The libc++ implementation of `from_promise` is
to use a clang intrinsic for getting base address of the coroutine frame that
passes the address of the promise and its alignment [4], which by definition
are identical for an object that is pointer-interconvertible and has the same
alignment. The intrinsic's implementation [5] would have no trouble with this
as far as I can tell, and in practice using the code above does work reliably
in clang despite technically being UB.
The story for libstdc++ and gcc seems to be identical [6] [7].
I don't have access to the MSVC source, but their STL also uses an intrinsic
that simply accepts the address of the promise [8]. In this case there isn't
even an alignment provided. (I don't see how it can be correct to leave out an
alignment; perhaps it's a bug and over-aligned promises are not correctly
supported.)
---
Does anybody have any feedback on this, perhaps some implementation difficulty
I've missed?
Thanks,
Aaron
[1]: https://rust-lang.github.io/wg-async/design_docs/async_stack_traces.html
[2]: https://www.llvm.org/docs/Coroutines.html
[3]: https://devblogs.microsoft.com/oldnewthing/20211007-00/?p=105777
[4]: https://github.com/llvm/llvm-project/blob/9cf4419e/libcxx/include/__coroutine/coroutine_handle.h#L116-L123
[5]: https://github.com/llvm/llvm-project/blob/9cf4419e/llvm/lib/Transforms/Coroutines/CoroEarly.cpp#L54-L80
[6]: https://github.com/gcc-mirror/gcc/blob/ec1db901/libstdc%2B%2B-v3/include/std/coroutine#L199-L206
[7]: https://github.com/gcc-mirror/gcc/blob/ec1db901/gcc/coroutine-passes.cc#L88-L125
[8]: https://github.com/microsoft/STL/blob/8ddf4da2/stl/inc/coroutine#L101-L107

Received on 2023-01-03 00:24:33