ISOCPP std-proposals List: Re: [std-proposals] Allowing coroutine_handle::from_promise to accept a pointer-interconvertible object

From: Aaron Jacobs <jacobsa_at_[hidden]>
Date: Wed, 4 Jan 2023 18:12:42 +1100

On Wed, Jan 4, 2023 at 5:24 PM Lewis Baker <lewissbaker_at_[hidden]> wrote:
> For an example of how async stack traces were done at FB, see this
> blog post:

This is really useful, thanks. It's good validation to hear that it presented
similar problems to what I'm encountering, and that something like this could
allow removing workarounds that are costing runtime performance.

> With some tweaks to the ABI of coroutines to force the offset between
> the frame address and the promise address to be determined without
> needing to know the size/alignment of the promise type, much of this
> overhead could be eliminated.
>
> Adopting this sort of coroutine ABI in all compilers would be a
> necessary prerequisite for adopting the proposal you mention here.

I don't think I see this as a prerequisite. Could you elaborate?

I gave details in my original post about why this would be compatible with
the three major compilers with no change whatsoever. None of them use any
more information than the promise's address and alignment, and I've
intentionally worded the proposal so that these remain identical even with
the newly-legal inputs.

It doesn't seem like this requires an ABI change. You could further weaken
the requirements with an ABI change, but this proposal is forwards-compatible
with that as far as I can tell.

> Developers tend to be more careful about recursion as they've been
> trained to avoid unbounded/deep recursion due to the risk of stack
> overflow with normal functions and so I haven't really seen issues
> with stack-overflow due to deep levels of recursion of coroutines in
> practice.
>
> Do you have use cases where the recursion depth during destruction is
> a concern here?

This definitely comes up less frequently than the mutual recursion/async loop
case you mentioned, but it has come up naturally in my work with async
programming models at Google. The guaranteed tail-call feature gets us 95% of
the way to actually being able to use unbounded recursion, and fixing
destruction order seems to get us the rest of the way there. That's really
valuable! If it's feasible and the most elegant approach to a problem, I
don't see a reason not to allow unbounded chains.

> under Clang, the offset from the address of the function pointers
> to the address of the promise is dependent on the alignment of the
> promise.

Agreed. This is why I included the requirement that the alignments be
identical.

> Unfortunately, to my knowledge, I don't think the clang developers ever got
> around to implementing that ABI. The ABI for coroutines in clang would need
> to be updated before the change you propose here would be implementable in
> clang.

This does sound like a better ABI, but I think it's not necessary for my
proposal to work due to the alignment requirement (see above). If it were
ever adopted perhaps that requirement could be removed.

> In some earlier versions of MSVC c. 2017/2019, the layout of
> the coroutine frame had the promise object placed before the
> function-pointers. i.e. it had something like:
>
> struct __frame { promise_type promise; void(*resumeFn)(void*); //
> <-- coroutine address() pointed here - used for both resume/destroy
> std::uint32_t resumeIndex; // least-significant bit indicates (0 -
> resume, 1 - destroy) bool elided_allocation; // ... };
>
> Where the address of the promise was computed by subtracting the size
> of the promise (padded to alignment of a function pointer).
>
> The initial implementation divergence in coroutine frame layout where
> some of the compilers had an offset between the frame address and the
> promise object that was dependent on the layout of the promise was the
> main reason for the restrictions that are currently in place in C++20.

Yeah, Raymond Chen's post on this (linked from my original post) is
interesting. My understanding is that this was in a pre-standardization mode,
and that MSVC's C++20 mode has always used the modern ABI. The implementation
of coroutine_handle in MSVC's STL that I linked is not correct for this older
ABI.

> Instead of calling destroy() to unwind, we should ideally have a
> separate "resume_with_cancel" operation that we can resume the
> coroutine along a different path - a cancellation path that unwinds
> from the current suspend point back to final_suspend(), but without
> destroying the coroutine - and only allow destruction of the coroutine
> frame when the coroutine is suspended either at an initial or final
> suspend point. The fact that we have combined "destruction" and
> "unwind/cancellation" into the same operation causes problems in
> general for use-cases like async-generators where we might want to
> cancel the generator but still perform some async cleanup work.
>
> This was something the papers P1662, P1663 and P1745 identified
> and explored potential solutions to - having separate resumption
> paths along with symmetric-transfer leads to a need to separate the
> suspend-point (which represents many possible resumption paths) from
> the continuation (which represents a chosen resumption path). I'm
> not sure how viable taking an approach like that is now that we have
> standardised C++20 coroutines, however. It is an area for future
> investigation.

I agree this would be even better; I've often wanted async RAII and I look at
Rust's progress in that area with jealousy. I'd love to hear your thoughts
(perhaps off-thread or in a blog post?) on what can be done to further evolve
coroutines now that they've been standardized.

But: as far as I can tell, what I'm proposing here has zero runtime cost, has
zero need for the major implementations to change at all, and is
forwards-compatible with all of the other longer-term ideas you mentioned. It
also seems like it would have helped with Facebook's efforts, which hit the
same difficulty as Google. Assuming I can convince you of these things or we
can massage the wording so that they're true, is there any reason not to do
it?

Thanks,
Aaron

Received on 2023-01-04 07:13:09