Date: Thu, 5 Jan 2023 14:11:14 +1030
On Wed, Jan 4, 2023 at 5:43 PM Aaron Jacobs <jacobsa_at_[hidden]> wrote:
> On Wed, Jan 4, 2023 at 5:24 PM Lewis Baker <lewissbaker_at_[hidden]> wrote:
>
> > With some tweaks to the ABI of coroutines to force the offset between
> > the frame address and the promise address to be determined without
> > needing to know the size/alignment of the promise type, much of this
> > overhead could be eliminated.
> >
> > Adopting this sort of coroutine ABI in all compilers would be a
> > necessary prerequisite for adopting the proposal you mention here.
>
> I don't think I see this as a prerequisite. Could you elaborate?
>
> I gave details in my original post about why this would be compatible with
> the three major compilers with no change whatsoever. None of them use any
> more information than the promise's address and alignment, and I've
> intentionally worded the proposal so that these remain identical even with
> the newly-legal inputs.
>
> It doesn't seem like this requires an ABI change. You could further weaken
> the requirements with an ABI change, but this proposal is
> forwards-compatible
> with that as far as I can tell.
>
I guess in my use cases, the constraint that the concrete promise type has
the same alignment as the base class that you want to construct the
coroutine_handle for was not something I could assume.
e.g. I have a PromiseBase class that contains the coroutine_handle
data-member for the parent coroutine, but then I have a derived Promise<T>
that holds storage for a T.
We had some cases where T was a matrix class from a numerical library that
was cache-line aligned (64 or 128 byte).
If the limitation was imposed that the base-class needed to have the same
alignment as the concrete promise type then we'd have to have the
PromiseBase declared to have the maximum possible alignment that we wanted
the concrete promise type to have. Although in theory it may also be
possible to reserve enough storage inside the promise to be able to
dynamically align storage for T inside the promise if T had alignment
larger than 2 pointers - this would come with increased complexity (need to
use placement new, manually manage lifetimes) and some runtime overhead
though.
If possible, I'd prefer to work towards a solution that did not have the
same alignment requirement.
> Developers tend to be more careful about recursion as they've been
> > trained to avoid unbounded/deep recursion due to the risk of stack
> > overflow with normal functions and so I haven't really seen issues
> > with stack-overflow due to deep levels of recursion of coroutines in
> > practice.
> >
> > Do you have use cases where the recursion depth during destruction is
> > a concern here?
>
> This definitely comes up less frequently than the mutual recursion/async
> loop
> case you mentioned, but it has come up naturally in my work with async
> programming models at Google. The guaranteed tail-call feature gets us 95%
> of
> the way to actually being able to use unbounded recursion, and fixing
> destruction order seems to get us the rest of the way there. That's really
> valuable! If it's feasible and the most elegant approach to a problem, I
> don't see a reason not to allow unbounded chains.
>
One of the downsides to manually walking the chain of coroutine frames in a
loop and calling destroy() on each of them is that this will inhibit the
compiler's ability to elide the allocations of those coroutines. So every
coroutine frame will end up being a separate heap allocation, regardless of
whether the call-stack is recursive or not.
If instead you cancel by resuming the leaf-most coroutine with a "cancel"
result and letting the cancel result naturally bubble back up to the top
level then you can achieve O(1) cancellation of deep stacks without the
recursive destroy() problem and while still preserving the ability for the
compiler to elide coroutine frame allocations. This is the approach that
folly::coro::Task takes - cancellation result is reported as an
OperationCancelled error (either as an exception or wrapped in a
folly::Try<T>).
> > under Clang, the offset from the address of the function pointers
> > to the address of the promise is dependent on the alignment of the
> > promise.
>
> Agreed. This is why I included the requirement that the alignments be
> identical.
>
Sorry, I must have missed that constraint in your original post.
I agree if you apply that constraint then it's implementable.
I'm not sure if this constraint is desirable, though.
>
> > Instead of calling destroy() to unwind, we should ideally have a
> > separate "resume_with_cancel" operation that we can resume the
> > coroutine along a different path - a cancellation path that unwinds
> > from the current suspend point back to final_suspend(), but without
> > destroying the coroutine - and only allow destruction of the coroutine
> > frame when the coroutine is suspended either at an initial or final
> > suspend point. The fact that we have combined "destruction" and
> > "unwind/cancellation" into the same operation causes problems in
> > general for use-cases like async-generators where we might want to
> > cancel the generator but still perform some async cleanup work.
> >
> > This was something the papers P1662, P1663 and P1745 identified
> > and explored potential solutions to - having separate resumption
> > paths along with symmetric-transfer leads to a need to separate the
> > suspend-point (which represents many possible resumption paths) from
> > the continuation (which represents a chosen resumption path). I'm
> > not sure how viable taking an approach like that is now that we have
> > standardised C++20 coroutines, however. It is an area for future
> > investigation.
>
> I agree this would be even better; I've often wanted async RAII and I look
> at
> Rust's progress in that area with jealousy. I'd love to hear your thoughts
> (perhaps off-thread or in a blog post?) on what can be done to further
> evolve
> coroutines now that they've been standardized.
>
> But: as far as I can tell, what I'm proposing here has zero runtime cost,
> has
> zero need for the major implementations to change at all, and is
> forwards-compatible with all of the other longer-term ideas you mentioned.
> It
> also seems like it would have helped with Facebook's efforts, which hit the
> same difficulty as Google.
My main concern is whether the constraint that the promise-types have the
same alignment is a problematic constraint or not.
I can see that there would be some ways to work around this constraint when
dealing with types that are overaligned, but it would be difficult to
diagnose at compile-time by the compiler or standard library and so could
become a bit of a footgun.
My preference would be to relax the same-alignment constraint if possible.
I don't *think *updating Clang to use the improved ABI with the padding
before the function-pointers would be a large change - so it might be worth
exploring sooner if it would enable relaxing this alignment constraint.
Although anything touching ABI is potentially problematic for someone...
> Assuming I can convince you of these things or we
> can massage the wording so that they're true, is there any reason not to do
> it?
>
The main reason not to do something like this would be if implementers
wanted to retain implementation flexibility in the coroutine frame layout
here.
I can't really speak to this, however, so it'd be useful to hear from
implementers as to whether this would be overly constraining.
- Lewis
> On Wed, Jan 4, 2023 at 5:24 PM Lewis Baker <lewissbaker_at_[hidden]> wrote:
>
> > With some tweaks to the ABI of coroutines to force the offset between
> > the frame address and the promise address to be determined without
> > needing to know the size/alignment of the promise type, much of this
> > overhead could be eliminated.
> >
> > Adopting this sort of coroutine ABI in all compilers would be a
> > necessary prerequisite for adopting the proposal you mention here.
>
> I don't think I see this as a prerequisite. Could you elaborate?
>
> I gave details in my original post about why this would be compatible with
> the three major compilers with no change whatsoever. None of them use any
> more information than the promise's address and alignment, and I've
> intentionally worded the proposal so that these remain identical even with
> the newly-legal inputs.
>
> It doesn't seem like this requires an ABI change. You could further weaken
> the requirements with an ABI change, but this proposal is
> forwards-compatible
> with that as far as I can tell.
>
I guess in my use cases, the constraint that the concrete promise type has
the same alignment as the base class that you want to construct the
coroutine_handle for was not something I could assume.
e.g. I have a PromiseBase class that contains the coroutine_handle
data-member for the parent coroutine, but then I have a derived Promise<T>
that holds storage for a T.
We had some cases where T was a matrix class from a numerical library that
was cache-line aligned (64 or 128 byte).
If the limitation was imposed that the base-class needed to have the same
alignment as the concrete promise type then we'd have to have the
PromiseBase declared to have the maximum possible alignment that we wanted
the concrete promise type to have. Although in theory it may also be
possible to reserve enough storage inside the promise to be able to
dynamically align storage for T inside the promise if T had alignment
larger than 2 pointers - this would come with increased complexity (need to
use placement new, manually manage lifetimes) and some runtime overhead
though.
If possible, I'd prefer to work towards a solution that did not have the
same alignment requirement.
> Developers tend to be more careful about recursion as they've been
> > trained to avoid unbounded/deep recursion due to the risk of stack
> > overflow with normal functions and so I haven't really seen issues
> > with stack-overflow due to deep levels of recursion of coroutines in
> > practice.
> >
> > Do you have use cases where the recursion depth during destruction is
> > a concern here?
>
> This definitely comes up less frequently than the mutual recursion/async
> loop
> case you mentioned, but it has come up naturally in my work with async
> programming models at Google. The guaranteed tail-call feature gets us 95%
> of
> the way to actually being able to use unbounded recursion, and fixing
> destruction order seems to get us the rest of the way there. That's really
> valuable! If it's feasible and the most elegant approach to a problem, I
> don't see a reason not to allow unbounded chains.
>
One of the downsides to manually walking the chain of coroutine frames in a
loop and calling destroy() on each of them is that this will inhibit the
compiler's ability to elide the allocations of those coroutines. So every
coroutine frame will end up being a separate heap allocation, regardless of
whether the call-stack is recursive or not.
If instead you cancel by resuming the leaf-most coroutine with a "cancel"
result and letting the cancel result naturally bubble back up to the top
level then you can achieve O(1) cancellation of deep stacks without the
recursive destroy() problem and while still preserving the ability for the
compiler to elide coroutine frame allocations. This is the approach that
folly::coro::Task takes - cancellation result is reported as an
OperationCancelled error (either as an exception or wrapped in a
folly::Try<T>).
> > under Clang, the offset from the address of the function pointers
> > to the address of the promise is dependent on the alignment of the
> > promise.
>
> Agreed. This is why I included the requirement that the alignments be
> identical.
>
Sorry, I must have missed that constraint in your original post.
I agree if you apply that constraint then it's implementable.
I'm not sure if this constraint is desirable, though.
>
> > Instead of calling destroy() to unwind, we should ideally have a
> > separate "resume_with_cancel" operation that we can resume the
> > coroutine along a different path - a cancellation path that unwinds
> > from the current suspend point back to final_suspend(), but without
> > destroying the coroutine - and only allow destruction of the coroutine
> > frame when the coroutine is suspended either at an initial or final
> > suspend point. The fact that we have combined "destruction" and
> > "unwind/cancellation" into the same operation causes problems in
> > general for use-cases like async-generators where we might want to
> > cancel the generator but still perform some async cleanup work.
> >
> > This was something the papers P1662, P1663 and P1745 identified
> > and explored potential solutions to - having separate resumption
> > paths along with symmetric-transfer leads to a need to separate the
> > suspend-point (which represents many possible resumption paths) from
> > the continuation (which represents a chosen resumption path). I'm
> > not sure how viable taking an approach like that is now that we have
> > standardised C++20 coroutines, however. It is an area for future
> > investigation.
>
> I agree this would be even better; I've often wanted async RAII and I look
> at
> Rust's progress in that area with jealousy. I'd love to hear your thoughts
> (perhaps off-thread or in a blog post?) on what can be done to further
> evolve
> coroutines now that they've been standardized.
>
> But: as far as I can tell, what I'm proposing here has zero runtime cost,
> has
> zero need for the major implementations to change at all, and is
> forwards-compatible with all of the other longer-term ideas you mentioned.
> It
> also seems like it would have helped with Facebook's efforts, which hit the
> same difficulty as Google.
My main concern is whether the constraint that the promise-types have the
same alignment is a problematic constraint or not.
I can see that there would be some ways to work around this constraint when
dealing with types that are overaligned, but it would be difficult to
diagnose at compile-time by the compiler or standard library and so could
become a bit of a footgun.
My preference would be to relax the same-alignment constraint if possible.
I don't *think *updating Clang to use the improved ABI with the padding
before the function-pointers would be a large change - so it might be worth
exploring sooner if it would enable relaxing this alignment constraint.
Although anything touching ABI is potentially problematic for someone...
> Assuming I can convince you of these things or we
> can massage the wording so that they're true, is there any reason not to do
> it?
>
The main reason not to do something like this would be if implementers
wanted to retain implementation flexibility in the coroutine frame layout
here.
I can't really speak to this, however, so it'd be useful to hear from
implementers as to whether this would be overly constraining.
- Lewis
Received on 2023-01-05 03:41:28