C++ Logo

std-proposals

Advanced search

Re: 回复: Delay the judgement for coroutine function after the instantiation of template entity.

From: Jason McKesson <jmckesson_at_[hidden]>
Date: Thu, 21 Jan 2021 10:23:35 -0500
On Wed, Jan 20, 2021 at 9:45 PM chuanqi.xcq via Std-Proposals
<std-proposals_at_[hidden]> wrote:
> >> I'm curious to see a complex example too. Especially since the main
> point of a coroutine is its ability to `co_await` on other processes,
> and that's not something that's very easy to `if constexpr` your way
> around. Especially since the variables created in an `if constexpr`
> block are local to that block, so you can't exactly do this:
> >
> >> ```
> >>if constexpr(UseCoro)
> >>{
> >> auto value = co_await(coro_expr);
> >>}
> >>else
> >>{
> >> auto value = regular_expr
> >>}
> >>
> >>//Use `value`
> >>```
>
> Yes, my personal solution based on constexpr-if support for coroutine is:
> ```
> #define MAYCOAWAITEXPR(UseCoro, GetMethod, ARGS...) \
> ({ \
> decltype(GetMethod<false>(ARGS...)) value; \
> if constexpr (UseCoro) \
> value = co_await GetMethod<UseCoro>(ARGS...); \
> else \
> value = GetMethod<UseCoro>(ARGS...); \
> value; \
> })
> auto value = MAYCOAWAITEXPR(UseCoro, GetMethod);
> ```
> The macro MAYCOAWAITEXPR uses GNU statement expression extension.
>
> > I do agree that it would be useful if Chuanqi provided a more fleshed-out and realistic example.
> > I'm curious to see a complex example too.
>
> Let me try to give a precise and short example. In our situation, the original codes consists of normal function and chains of calls. The length of the longest chain of calls could be nearly over 40.
>
> After stackless coroutine is knwon to be in C++20 in 2019, we try to use coroutine to refactor our codes extensively, which means a lot of effort.

Here's what I don't really understand: why? What is this code doing
that it makes sense to convert the *entire* callstack into a series of
coroutines? Making an entire sequence of calls coroutines seems like
overkill, unless all of these functions are big and are all being
executed asynchronously *separately* from the other functions in the
stack.

Let's say that we have a function X which is, by its nature, an
asynchronous coroutine function. This means that X has to schedule its
resumption based on some external asynchronous process, like doing
file/networking IO, etc. Doing this is *why* you made the function a
coroutine; it is the nature of X that it waits on something
asynchronously. And let's say that we have some function Y which gets
called by X.

Just because X is a coroutine doesn't mean that Y has to be one too. Y
only needs to be a coroutine if it needs to pause its execution and
resume it based on some external asynchronous process. And this is
(usually) something that is *intrinsic* to the very nature of a
function. That is, whether Y is a coroutine or not is a property of
doing whatever it is that Y is *doing*, not how Y gets *called*.

The only case I could imagine where Y may or may not be a coroutine is
if part of Y's execution is determined by its caller, through being
given a callable object of some kind. If the callable invokes an
asynchronous process, then Y might want to schedule its resumption
with that process and thus Y would want to be a coroutine. But if the
callable doesn't invoke an asynchronous process, then Y doesn't need
to schedule its resumption.

Note that this is a pretty fundamental problem with `co_await`-style
coroutines *in general*. You can't use `std::for_each` if the functor
is a coroutine, as this would require `for_each` to manually
`co_await` on each invocation of the functor. And that's not how
`for_each` is written. So algorithms either require a coroutine or
require the functor to *not* be a coroutine; there's no way to have
the algorithm rewrite itself based on that.

It's a case where stackful coroutines are just better than stackless:
functions can arbitrarily *force* code higher up in the callstack to
suspend and resume without that code knowing its happening.

In any case, none of the cases you've shown here are like that. None
of these functions are being given a potentially asynchronous process
that it may or may not await on. In your cases, the question is
whether to treat *all functions* that your function calls as
asynchronous processes to await on.

I believe that is what people are talking about when they say that you
shouldn't want this. It really feels like you're trying to force
stackless coroutines to work like stackful coroutines.

> The method of our refactoring is change a normal function to coroutine function, whcih would change the return type from int to task<int> and change a normal function call to a co_await expression.
>
> Note that we don't change every function to coroutine function and changed every function call to co_await expression. Maybe this is a silly note.
>
> And after we refactor these codes, we find that coroutine give a great perform improvement when the concurrency is very very high. But when the concurrency is not so high, the performance is poor.
>
> > I'm also curious to know what situations they find coroutines
> performing poorly for.
>
> From our analysis, the reason why coroutines performing poorly is the big try-catch statement inserted by the coroutine standard. It is imported by the design and compiler couldn't do much about that. And we also know it is much much harder to change the exception specification.

I was looking for something more specific in terms of the actual code,
not the details of how the compiler generated non-optimal assembly.
And specifically, I'm trying to understand the *meaning* of the code,
not just a bunch of no-name functions that call each other. I want to
know the details of what you're trying to do that led to the cases of
both good performance and bad performance.

> > I'm curious to see a complex example too.
>
> I think it is hard to give an concise example in codes, let me try it.
>
> Before refactoring, the codes seems like:
> ```
> int funcA(....) {
> // some logics
> int v = funcB(...);
> // other logics...
> }
> int funcB(....) {
> // some logics
> int v = funcC(...);
> // other logics...
> }
> int funcC(...) {
> // an actual asynchoronous situation
> ...
> }
> ```
>
> After refacoring:
> ```
> task<int> funcA(....) {
> // some logics
> int v = co_awiat funcB(...);
> // other logics...
> }
> task<int> funcB(....) {
> // some logics
> int v = co_await funcC(...);
> // other logics...
> }
> task<int> funcC(...) {
> // an actual asynchoronous situation
> co_await something;
> ...
> co_return something;
> }
> ```
> And we want to:
>
> ```
> template<bool UseCoro>
> CondCoro<UseCoro, int> funcA(....) {
> // some logics
> int v =MAYCOAWAITEXPR(UseCoro, funcB, ...);
> // other logics...
> }
> task<int> funcB(....) {
> // some logics
> int v = MAYCOAWAITEXPR(UseCoro, funcC, ...);
> // other logics...
> }
> task<int> funcC(...) {
> // an actual asynchoronous situation
> co_await something;
> ...
> co_return something;
> }
> ```
>
> > I brought this up on the cpplang Slack (in the #coroutines channel). Personally I tend to agree with your take, but the consensus in Slack seemed to be "You shouldn't want that."
>
> Here my point is that it is natural to make constexpr-if work for coroutines. And this suggestion looks really harmless. So I am curious about the reason why people don't want to make constexpr-if work for coroutine.

Because a function should either be a coroutine or not be a coroutine.

Consider something like `std::copy`. If the range type it is given is
contiguous, and the value type is trivially copyable, then it can do a
memcpy on its contents rather than step-by-step assignment. That's
good, and `if constexpr` makes that pretty easy to write. But that
doesn't change the *nature* of the function, nor does it change how
you fundamentally interact with it.

Being a coroutine *does* change these things. External code has to be
written differently for the coroutine version than the non-coroutine
version. That's what your macro is for after all. Just because you
found a way to minimize those differences doesn't mean they aren't
*there*.

You are writing two separate functions that you want to have the same
name. This feels like an improper use of overloading/template
instantiation. It feels a lot like `vector<bool>`, which has special
interfaces and different iterator categories separate from
`vector<AnythingElse>`. You can't just swap a `vector<bool>` in
without thinking about it. So it shouldn't be spelled "vector<bool>".

Just as it should be for these functions.

Received on 2021-01-21 09:23:48