std-proposals: 回复：回复： Delay the judgement for coroutine function after the instantiation of template entity.

From: chuanqi.xcq <yedeng.yd_at_[hidden]>
Date: Fri, 22 Jan 2021 15:11:59 +0800

>> To Jason McKesson
> It's difficult to respond to you when your post intermingles responses
to different posts that are making different points. If you don't want
to send multiple e-mails, then at least make it clear which parts of
the text are responding to which people. Don't jumble them up
together.

       What's the tool do you use to send email? I need to copy and add '>' symbol in the front by hande. It looks like the paragraphs you replies are generated by the tool.

>>>> To Jason McKesson
> > > Let's say that we have a function X which is, by its nature, an
> > asynchronous coroutine function. This means that X has to schedule its
> > resumption based on some external asynchronous process, like doing
> > file/networking IO, etc. Doing this is *why* you made the function a
> > coroutine; it is the nature of X that it waits on something
> > asynchronously. And let's say that we have some function Y which gets
> > called by X.
>
> The picture I see is that X is a coroutine and Y is a caller of X which need to do something only after X made his job. So Y should be a coroutine too, isn't it?

> You got that backwards; Y "gets called by X". And no, the caller of X
> doesn't need to be a coroutine *either*. At some point, every
> coroutine has to be called by some function that is not a coroutine.

        There is really a gap. Let's say that A is coroutine and B is function which has something to done only after A made its job. So we need to co_await A in the function body of B. Then the function B becomes a coroutine. And there is function C which need to wait B to made its job done. And C would become a coroutine too. The chain of coroutines in our codes comes from such a story.

>> To Jason McKesson
> And most of the code between these two points *does not care* if suspending happens or not.

       I agree with the statement literally. To make it clear, most of the codes between these two points *doesn't care* if suspending happens or not. But these codes care about whether the callee has made its job done.

>> To Jason McKesson
> All of this adds up to a textbook example of when to use stackful
coroutines. They can suspend through *anything*; none of the code
between the source and the receiver needs to know they are in a
coroutine.

       I agree with this. In fact, we had made experiments to use stackful coroutine to refactor our codes.

>> To Jason McKesson
> So what we come down to is this: you want this feature so that you can
(ab)use stackless coroutines in a scenario that is almost tailor-made
for stackful coroutines. And stackful coroutines would almost
certainly alleviate your performance problems in less asynchronous
cases, since each function in the graph won't be its own heap
allocation.

       But I can't agree with that we are abusing stackless coroutine. At least, we get very high performance gain and stability improvement when the concurrency is high by refactoring the codes into stackless coroutine. In fact, all of us think it is a successful experiment to refactor these codes use stackless coroutine.

>> To Jason McKesson
> So I would say that this is not a good motivating case for the change
to the standard, since you're only encountering this problem because
you're writing your code wrong.

      Same with above, we don't think it is wrong to use stackless coroutine to refactor our codes.

>> To Jason McKesson
> You misunderstood my point. In one instantiation, you had a function
that returned an `int`; in its coroutine form, it returned a
`task<int>`. It doesn't much matter if the coroutine form is a "true"
coroutine or just something that returns a `task<int>`. What matters
is that the way the caller *uses* the function must change.

> Broadly speaking, if you have a template function, instantiating it
with different parameters may change its return type, but it shouldn't
unexpectedly change the basic way you *interact* with that kind of
type. And I know there are functions in the standard library that
violates those rules (`any_cast` being the most prominent). But it's
not a thing we should encourage.

      To my understand, your point here is that we *shouldn't* change the return type by template parameters sicne it is a bad practice. But in fact there is two things, the static-if we want is a language feature and the example we give above is an application. And we can give an example that the return type of the template function wouldn't change. For example, both version of func is returning Task<int>:
      ```
        template<bool UseCoro>
        Task<int> funcA(...) {
             if constexpr (UseCoro)
                co_return co_await funcB<UseCoro>(...);
             else
                return Task<int>(funcB<UseCoro>(...).get()); // We can implement `get` by conditional variable or stackful coroutines.
         }
      ```
      And the caller of funcA would always get a Task<int>. Then the caller could use co_await or `get` to get the value. Although I don't know a specific work situation for this situation right now.

      All I want to say is the static-if is a language feature, and the user could use these feature to do their applications. Although you may say all these applications I give are bad practices and the feature shouldn't be enabled, I still think it is odd that constexpr-if wouldn't work for coroutine.
>> To Jason McKesson
> All of this adds up to a textbook example of when to use stackful
coroutines. They can suspend through *anything*; none of the code
between the source and the receiver needs to know they are in a
coroutine.

      Reply to this paragraph again for something unrelated to previous discussion. From my work exeperience, the stackful coroutine is really easy to use and easy to understand, while the stackless coroutine stands in the opposite position exactly. Every time we want to make our c++ projects to use c++20 coroutine, we always need to refactor the codes for monthes to get some performance gains only in some cases. But as you said, stackful coroutine would perfom better when the concurrency isn't high. So the question I want to ask is, in what situation, we should use stackless coroutine instead of stackful coroutine? Or maybe we need to discuss this question in other place.

      Thanks,
      Chuanqi

------------------------------------------------------------------
发件人：Jason McKesson via Std-Proposals <std-proposals_at_[hidden]>
发送时间：2021年1月22日(星期五) 13:57
收件人：[无]
抄　送：Jason McKesson <jmckesson_at_[hidden]>; std-proposals <std-proposals_at_[hidden]>
主题：Re: [std-proposals] 回复： Delay the judgement for coroutine function after the instantiation of template entity.

It's difficult to respond to you when your post intermingles responses
to different posts that are making different points. If you don't want
to send multiple e-mails, then at least make it clear which parts of
the text are responding to which people. Don't jumble them up
together.

On Thu, Jan 21, 2021 at 10:57 PM chuanqi.xcq
<yedeng.yd_at_[hidden]> wrote:
>
> > Because a function should either be a coroutine or not be a coroutine.
>
> I agree with this statement. But a template function isn't a function.

When I said "a function", I meant "that code you wrote that has a
particular name." And don't rules-lawyer about "well actually template
arguments are part of its name;" you know what I'm talking about.

> > Being a coroutine *does* change these things. External code has to be
> written differently for the coroutine version than the non-coroutine
> version.
>
> This may be one side effects. But my thought is the template argument is part of the function declaration. And now I think it is not easy to know whether function is coroutine by the declaration.
>
> For example, the following code:
> ```
> SomeType func();
> ```
>
> Can we know whether func is coroutine or not from declaration? No, we need to see the implementation or the comment to decide.

You misunderstood my point. In one instantiation, you had a function
that returned an `int`; in its coroutine form, it returned a
`task<int>`. It doesn't much matter if the coroutine form is a "true"
coroutine or just something that returns a `task<int>`. What matters
is that the way the caller *uses* the function must change.

Broadly speaking, if you have a template function, instantiating it
with different parameters may change its return type, but it shouldn't
unexpectedly change the basic way you *interact* with that kind of
type. And I know there are functions in the standard library that
violates those rules (`any_cast` being the most prominent). But it's
not a thing we should encourage.

> > Let's say that we have a function X which is, by its nature, an
> asynchronous coroutine function. This means that X has to schedule its
> resumption based on some external asynchronous process, like doing
> file/networking IO, etc. Doing this is *why* you made the function a
> coroutine; it is the nature of X that it waits on something
> asynchronously. And let's say that we have some function Y which gets
> called by X.
>
> The picture I see is that X is a coroutine and Y is a caller of X which need to do something only after X made his job. So Y should be a coroutine too, isn't it?

You got that backwards; Y "gets called by X". And no, the caller of X
doesn't need to be a coroutine *either*. At some point, every
coroutine has to be called by some function that is not a coroutine.

> > I was looking for something more specific in terms of the actual code,
> not the details of how the compiler generated non-optimal assembly.
> And specifically, I'm trying to understand the *meaning* of the code,
> not just a bunch of no-name functions that call each other. I want to
> know the details of what you're trying to do that led to the cases of
> both good performance and bad performance.
>
> Because of business secrets, I can't show the original codes and I also think it isn't necessary. Here is a more detailed example I think is enough to discuss:
> ```
> auto ReaderByPrefix(const std::string& prefixKey, const std::vector<std::string>& suffixKeys,
> CacheType cache, MemoryPoolPool* pool, MetricsCollector* metricsCollector) {
> assert(pool);
> KeyType keyHash(0);
> if (!GetKeyHash(prefixKey, keyHash))
> {
> if (metricsCollector)
> metricsCollector->EndQuery(KeyHash, "Not find Hash for PrefixKey: ", prefixKey);
> co_return Iterator(pool);
> }
> auto Res = co_await ReaderByKey(keyHash, suffixKeys, cache, pool, metricsCollector);
> tryInsertToCache(Res, cache);
> co_return Res;
> }
>
> auto ReaderByKey(KeyType keyHash, const std::vector<std::string>& suffixKeys, CacheType cache,
> MemoryPoolPool* pool, MetricsCollector* metricsCollector)
> {
> assert(pool);
> std::vector<uint64_t> skeyHashs;
> if (!GetSKeyHashVec(suffixKeys, skeyHashs))
> {
> if (metricsCollector)
> metricsCollector->EndQuery(KeyHash, "Not find SKeyHash for vec for hash: ", keyHash);
> co_return Iterator(pool);
> }
> auto Res = co_await LookupVecs(keyHash, std::move(skeyHashs), cache, pool, metricsCollector);
> tryInsertToCache(Res, cache);
> co_return Res;
> }
>
> auto LookupVecs(KeyType keyHash, std::vector<uint64_t> skeyHashs, CacheType cache,
> MemoryPoolPool* pool, MetricsCollector* metricsCollector) {
> auto Res = co_await SearchInCache(cache, keyHash, skeyHashs, pool, metricsCollector);
> if (Res) {
> metricsCollector->record(Res);
> co_return Res;
> }
> auto Reader = pool->getReader();
> auto SearchType = keyHash.getType();
> switch (SearchType) {
> case ReaderType1:
> // ReaderType1::LookupImpl maybe blocking
> auto Res = co_await static_cast<ReaderType1*>(Reader)->LookupImpl(keyHash, skeyHashs, pool, metricsCollector);
> tryInsertToCache(Res, cache);
> break;
> case ReaderType2:
> // and so on;
> break;
> default: {
> // ...
> }
> }
>
> if (metricsCollector)
> metricsCollector->EndQuery(Res, "End of query");
> co_return Iterator(pool);
> }
> ```
>
> Our project is a storage and query library. So there is a lot of interfaces which would be called by the user from the upper layer. Once a user starts a query, we need to made the query first and give the result to the user after that.

So, let me break down my understanding of your code from a quick
inspection. You have two basic asynchronous operations:
`SearchInCache` (whose asychronous nature I will assume is not due to
externally-defined code), and `LookupImpl`, which is user-provided
code that may or may not actually do asynchronous stuff. These are the
two terminal async operations; whenever this call graph suspends, it
will ultimately be because of one of those operations.

The source of the suspensions is deep in the call graph. There is a
very great deal of code between the source of the suspensions and the
actual root (the last caller that *isn't* a coroutine). And most of
the code between these two points *does not care* if suspending
happens or not.

All of this adds up to a textbook example of when to use stackful
coroutines. They can suspend through *anything*; none of the code
between the source and the receiver needs to know they are in a
coroutine.

So what we come down to is this: you want this feature so that you can
(ab)use stackless coroutines in a scenario that is almost tailor-made
for stackful coroutines. And stackful coroutines would almost
certainly alleviate your performance problems in less asynchronous
cases, since each function in the graph won't be its own heap
allocation.

So I would say that this is not a good motivating case for the change
to the standard, since you're only encountering this problem because
you're writing your code wrong.

-- 
Std-Proposals mailing list
Std-Proposals_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2021-01-22 01:12:05