Date: Sun, 4 Sep 2022 11:16:41 +0500
> The design of std::generator allows an *optional* allocator type to be
specified.
It is possible in my realisation too. Default allocator is
std::allocator<std::byte>
In case when you want to not specify default constructible allocator you
can do
generator<int> foo_coro(/*arguments with alloc*/)
generator<int> foo(/*arguments without alloc*/) {
return foo_coro(std::allocator_arg, Alloc{}, Args...);
}
In this case if allocator is empty compiler must optimize frame and do not
store allocator arg and alloc on frame
I dont see why it must be part of generator type and pollute the
interface + break vector<generator<X>> just for syntax sugar
> There is no extra type-erasure needed for allocators.
> Coroutines are themselves inherently type-erased
I know that coroutines is a type erasure, but it is not clear that
implementation will not use additional type erasure and why Alloc = void
called 'type erasure case'
> There is indeed some overhead for supporting the recursive generator
use-case
This is why i say "Pay for what you dont use"
> Current compilers generally struggle to inline/optimise operator++() as
this necessarily needs an extra level of indirection to be able to resume
different coroutines depending on what is the current leaf coroutine
Because it is recursive generator. Is there a benchmarks where just
generator will lose perfomance compare to recusrive generator? Recursive
generator has overhead even when compiler optimized operator++, we want to
create many generators, use them like fast-write output / input iterators
etc etc
Let's image that recursive generator in recursive case 10% better then just
generator, but recursive generator used in 1% use cases when just generator
needed in 99% cases. So we will have perfomance loss in general.
> Would you expect begin() to advance the iteration to the next element?
It can be just documented, that generator::begin uses resume, but why it
need to be UNDEFINED behavior? I dont see any case when
compiler/implementation will optimize something using this ub. Why not impl
defined eventually?
And what about case when generator created by move from other generator?
Will it be UB to use begin first time for this generator?
for(auto& x : gen) {
handle(x);
if(token.stop_requested())
break;
}
... // time to continue
for(auto& x :gen) // UB
> You are not handling alignment requirements of the allocator object
here. Need to pad to an address that is a multiple of alignof(Alloc).
I know and i leave a big comment about it. I dont really know why operator
new/delete used, why not just "allocate_frame" function like
await_suspend/await_resume/get_return_object?
Its really logically incorrect, this operator new used for corotuine frame,
but defined for promise type. And i dont see any guarantees about frame
alignment, what alignment i must use?)
Why i cant use alloc which user provided and it is stored on frame? It is
overhead.
And main problem - why ALLOCATOR support required? When it must be memory
resource support. In this case it must be exactly one non typed allocation.
Use allocator for this is too much.
What about some std:: type which will store this logic like "Im exactly one
allocation and deallocation from this object X, it can be memory resource
or allocator, and im a tag-type like std::allocator_arg_t"
generator<int> x(std::one_allocation_from<Alloc>);
> If you are storing the promise-pointer anyway, you could store a pointer
to the base-class instead.
i dont see why std::coroutine_handle<void>::promise is not exist. MSVC
implementation dont use nothing from promise type to get promise(just
reinterpret_cast), llvm implementation uses alignment, but why dont provide
alignment in promise method?
My implementation just demonstrates this problem
> currently implemented coroutine ABIs
I dont see how to do not change ABI with coroutines. Changing code in
coroutine may change ABI, optimization in corotuine frame may change ABI
вс, 4 сент. 2022 г. в 06:07, Lewis Baker <lewissbaker_at_[hidden]>:
>
>
> On Sat, Sep 3, 2022 at 6:19 PM Nikl Kelbon via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
>
>> std::generator has huge problems.
>> I will briefly list them:
>> 1. The generator must not depend on the allocator (only the promise_type
>> depends), but std::generator has a template argument for alloc, in
>> particular, this prohibits adding 2 generators that allocated memory
>> differently into a container. It is possible to implement this without type
>> erasure.
>> Link to example implementation:
>>
>> https://github.com/kelbon/dd_generator/blob/4053fcff456c885d9330708d2e67f103c4212319/include/dd_generator.hpp#L264
>>
>
> The design of std::generator allows an *optional* allocator type to be
> specified.
>
> If you do not specify it then the allocator is determined solely by the
> arguments.
> If there is a std::allocator_arg + allocator object pair passed as the
> first two arguments of the coroutine then it will use that.
> Otherwise, it will fall back to using std::allocator by default.
> In this way it is possible to create a container of std::generator objects
> that have different allocator types by just not specifying the allocator in
> the template argument.
>
> If the template argument is specified then either:
> - the allocator is stateless in which case it uses a default-constructed
> allocator and avoids the user needing to explictly specify the allocator
> - the allocator is stateful and the caller still needs to pass an instance
> of the allocator using std::allocator_arg
>
> The main motivation for allowing the allocator as a template argument to
> std::generator is to allow the stateless-allocator use-case to be more
> ergonomic.
> e.g. so we can define an alias
>
> template<typename Ref, typename Val = remove_cvref_t<Ref>>
> using my_generator = std:::generator<Ref, Val, my_stateless_allocator>;
>
> And then can just use my_generator throughout my program and it will just
> use the custom allocator without having to worry about passing
> std::allocator_arg to every coroutine function.
>
> There is no additional type-erasure necessary for supporting these
> use-cases.
> We can just leverage the inherent type-erasure of coroutines in general to
> handle this.
> e.g. see
> https://github.com/lewissbaker/generator/blob/main/include/__generator.hpp#L277-L330
>
> 2. Overhead for ueslss default type erasure(default behavior) and stack in
>> generator (used only for element_of, which is basically for(auto&& v : gen)
>> co_yield v;
>> Its just uselss
>> You pay for what you dont use.
>>
>
> What "useless default type erasure" are you referring to here?
> There is no extra type-erasure needed for allocators.
> Coroutines are themselves inherently type-erased (which we can take
> advantage of).
>
> There is indeed some overhead for supporting the recursive generator
> use-case.
> Current compilers generally struggle to inline/optimise operator++() as
> this necessarily needs an extra level of indirection to be able to resume
> different coroutines depending on what is the current leaf coroutine.
>
> The recursive generator is more efficient for recursive use-cases but less
> efficient for non-recursive use-cases.
> Supporting 'co_yield elements_of(x)' is not possible without the recursive
> support.
>
> We discussed whether it would be better to add a non-recursive generator
> or a recursive generator.
> I think the consensus was that it was much harder to write your own
> recursive generator and so that had more value to exist in the standard
> library.
> We can add another non-recursive generator type in future if compilers are
> unable to optimise the recursive generator sufficiently.
>
>
>> 3. It is undefined behavior to call begin for std::generator twice. But
>> iteraing not all values in one loop is a very logical action and expected
>> behavior - just keep generating values
>> Besides, such an error is extremely non-obvious when using ranges or
>> range based for loop.
>>
>
> This is a typical restriction provided by many other input_range
> implementations.
>
> Would you expect begin() to advance the iteration to the next element?
> Or would you expect to be able to call begin() multiple times and each
> time just get a copy of the iterator pointing to the same element?
> Supporting either of these would potentially affect performance - either
> requiring more state to be held in the generator to keep track of whether
> this was the first call to begin() or not, or require an extra indirection
> when starting the coroutine for the first time (since a subsequent call to
> begin() might need to resume a different coroutine than the top-level
> coroutine).
>
>
>> 4. interface that is very obscure to the user and requires knowledge of
>> implementation to write efficient code
>> No one will write generator<const T&> foo(); without knowing why it is
>> needed and why it is better for perfmornace
>>
>
> Can you please expand on what part of this you think is "obscure"?
> The user needs to make a choice about whether they want to yield a
> sequence of lvalues, const-lvalues, rvalues.
> This will affect what kinds of values they can pass to co_yield and
> whether they incur extra copies/etc.
>
>
>> But there are way to write generator without such problems
>> https://github.com/kelbon/dd_generator
>>
>
> Line 43
> <https://github.com/kelbon/dd_generator/blob/9e33ee704444cc7f8d36f67c7358c9d19e770a0d/include/dd_generator.hpp#L43>:
> You are not handling alignment requirements of the allocator object here.
> Need to pad to an address that is a multiple of alignof(Alloc).
> Line 32,48: You also need to check for
> std::allocator_traits<Alloc>::is_always_equal
>
> The need for any_coroutine_handle (which stores the coroutine_handle and
> the promise pointer) and reinterpret_cast to get back to the 'value'
> data-member seems like a fairly unsafe way of implementing this.
> If you are storing the promise-pointer anyway, you could store a pointer
> to the base-class instead.
> e.g. see my generator<Ref, Val, use_allocator_arg> implementation
> <https://github.com/lewissbaker/generator/blob/e7c6c1c3009f0e9cf32add465b16f6295bfb9ac4/include/__generator.hpp#L789>
>
> Note that we cannot have coroutine_handle<void> return a pointer to the
> promise with the currently implemented coroutine ABIs because the offset
> between the frame pointer and the promise address is not necessarily a
> constant - there may be padding added to align a promise type with
> alignment >= 2*sizeof(void*). We would either need to change the ABI so
> that the padding was placed before the function-pointers, or we'd need to
> store an extra field to record the offset of the promise so we can compute
> its address correctly.
>
> I would also worry about the void* obtained from the
> coroutine_handle<void>::promise() being used in an unsafe way
> - it could be confused with address()/from_address()
> - you need to use reinterpret_cast<> which is difficult to get right
> without invoking UB
> These are minor concerns, however.
>
> - Lewis.
>
>
specified.
It is possible in my realisation too. Default allocator is
std::allocator<std::byte>
In case when you want to not specify default constructible allocator you
can do
generator<int> foo_coro(/*arguments with alloc*/)
generator<int> foo(/*arguments without alloc*/) {
return foo_coro(std::allocator_arg, Alloc{}, Args...);
}
In this case if allocator is empty compiler must optimize frame and do not
store allocator arg and alloc on frame
I dont see why it must be part of generator type and pollute the
interface + break vector<generator<X>> just for syntax sugar
> There is no extra type-erasure needed for allocators.
> Coroutines are themselves inherently type-erased
I know that coroutines is a type erasure, but it is not clear that
implementation will not use additional type erasure and why Alloc = void
called 'type erasure case'
> There is indeed some overhead for supporting the recursive generator
use-case
This is why i say "Pay for what you dont use"
> Current compilers generally struggle to inline/optimise operator++() as
this necessarily needs an extra level of indirection to be able to resume
different coroutines depending on what is the current leaf coroutine
Because it is recursive generator. Is there a benchmarks where just
generator will lose perfomance compare to recusrive generator? Recursive
generator has overhead even when compiler optimized operator++, we want to
create many generators, use them like fast-write output / input iterators
etc etc
Let's image that recursive generator in recursive case 10% better then just
generator, but recursive generator used in 1% use cases when just generator
needed in 99% cases. So we will have perfomance loss in general.
> Would you expect begin() to advance the iteration to the next element?
It can be just documented, that generator::begin uses resume, but why it
need to be UNDEFINED behavior? I dont see any case when
compiler/implementation will optimize something using this ub. Why not impl
defined eventually?
And what about case when generator created by move from other generator?
Will it be UB to use begin first time for this generator?
for(auto& x : gen) {
handle(x);
if(token.stop_requested())
break;
}
... // time to continue
for(auto& x :gen) // UB
> You are not handling alignment requirements of the allocator object
here. Need to pad to an address that is a multiple of alignof(Alloc).
I know and i leave a big comment about it. I dont really know why operator
new/delete used, why not just "allocate_frame" function like
await_suspend/await_resume/get_return_object?
Its really logically incorrect, this operator new used for corotuine frame,
but defined for promise type. And i dont see any guarantees about frame
alignment, what alignment i must use?)
Why i cant use alloc which user provided and it is stored on frame? It is
overhead.
And main problem - why ALLOCATOR support required? When it must be memory
resource support. In this case it must be exactly one non typed allocation.
Use allocator for this is too much.
What about some std:: type which will store this logic like "Im exactly one
allocation and deallocation from this object X, it can be memory resource
or allocator, and im a tag-type like std::allocator_arg_t"
generator<int> x(std::one_allocation_from<Alloc>);
> If you are storing the promise-pointer anyway, you could store a pointer
to the base-class instead.
i dont see why std::coroutine_handle<void>::promise is not exist. MSVC
implementation dont use nothing from promise type to get promise(just
reinterpret_cast), llvm implementation uses alignment, but why dont provide
alignment in promise method?
My implementation just demonstrates this problem
> currently implemented coroutine ABIs
I dont see how to do not change ABI with coroutines. Changing code in
coroutine may change ABI, optimization in corotuine frame may change ABI
вс, 4 сент. 2022 г. в 06:07, Lewis Baker <lewissbaker_at_[hidden]>:
>
>
> On Sat, Sep 3, 2022 at 6:19 PM Nikl Kelbon via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
>
>> std::generator has huge problems.
>> I will briefly list them:
>> 1. The generator must not depend on the allocator (only the promise_type
>> depends), but std::generator has a template argument for alloc, in
>> particular, this prohibits adding 2 generators that allocated memory
>> differently into a container. It is possible to implement this without type
>> erasure.
>> Link to example implementation:
>>
>> https://github.com/kelbon/dd_generator/blob/4053fcff456c885d9330708d2e67f103c4212319/include/dd_generator.hpp#L264
>>
>
> The design of std::generator allows an *optional* allocator type to be
> specified.
>
> If you do not specify it then the allocator is determined solely by the
> arguments.
> If there is a std::allocator_arg + allocator object pair passed as the
> first two arguments of the coroutine then it will use that.
> Otherwise, it will fall back to using std::allocator by default.
> In this way it is possible to create a container of std::generator objects
> that have different allocator types by just not specifying the allocator in
> the template argument.
>
> If the template argument is specified then either:
> - the allocator is stateless in which case it uses a default-constructed
> allocator and avoids the user needing to explictly specify the allocator
> - the allocator is stateful and the caller still needs to pass an instance
> of the allocator using std::allocator_arg
>
> The main motivation for allowing the allocator as a template argument to
> std::generator is to allow the stateless-allocator use-case to be more
> ergonomic.
> e.g. so we can define an alias
>
> template<typename Ref, typename Val = remove_cvref_t<Ref>>
> using my_generator = std:::generator<Ref, Val, my_stateless_allocator>;
>
> And then can just use my_generator throughout my program and it will just
> use the custom allocator without having to worry about passing
> std::allocator_arg to every coroutine function.
>
> There is no additional type-erasure necessary for supporting these
> use-cases.
> We can just leverage the inherent type-erasure of coroutines in general to
> handle this.
> e.g. see
> https://github.com/lewissbaker/generator/blob/main/include/__generator.hpp#L277-L330
>
> 2. Overhead for ueslss default type erasure(default behavior) and stack in
>> generator (used only for element_of, which is basically for(auto&& v : gen)
>> co_yield v;
>> Its just uselss
>> You pay for what you dont use.
>>
>
> What "useless default type erasure" are you referring to here?
> There is no extra type-erasure needed for allocators.
> Coroutines are themselves inherently type-erased (which we can take
> advantage of).
>
> There is indeed some overhead for supporting the recursive generator
> use-case.
> Current compilers generally struggle to inline/optimise operator++() as
> this necessarily needs an extra level of indirection to be able to resume
> different coroutines depending on what is the current leaf coroutine.
>
> The recursive generator is more efficient for recursive use-cases but less
> efficient for non-recursive use-cases.
> Supporting 'co_yield elements_of(x)' is not possible without the recursive
> support.
>
> We discussed whether it would be better to add a non-recursive generator
> or a recursive generator.
> I think the consensus was that it was much harder to write your own
> recursive generator and so that had more value to exist in the standard
> library.
> We can add another non-recursive generator type in future if compilers are
> unable to optimise the recursive generator sufficiently.
>
>
>> 3. It is undefined behavior to call begin for std::generator twice. But
>> iteraing not all values in one loop is a very logical action and expected
>> behavior - just keep generating values
>> Besides, such an error is extremely non-obvious when using ranges or
>> range based for loop.
>>
>
> This is a typical restriction provided by many other input_range
> implementations.
>
> Would you expect begin() to advance the iteration to the next element?
> Or would you expect to be able to call begin() multiple times and each
> time just get a copy of the iterator pointing to the same element?
> Supporting either of these would potentially affect performance - either
> requiring more state to be held in the generator to keep track of whether
> this was the first call to begin() or not, or require an extra indirection
> when starting the coroutine for the first time (since a subsequent call to
> begin() might need to resume a different coroutine than the top-level
> coroutine).
>
>
>> 4. interface that is very obscure to the user and requires knowledge of
>> implementation to write efficient code
>> No one will write generator<const T&> foo(); without knowing why it is
>> needed and why it is better for perfmornace
>>
>
> Can you please expand on what part of this you think is "obscure"?
> The user needs to make a choice about whether they want to yield a
> sequence of lvalues, const-lvalues, rvalues.
> This will affect what kinds of values they can pass to co_yield and
> whether they incur extra copies/etc.
>
>
>> But there are way to write generator without such problems
>> https://github.com/kelbon/dd_generator
>>
>
> Line 43
> <https://github.com/kelbon/dd_generator/blob/9e33ee704444cc7f8d36f67c7358c9d19e770a0d/include/dd_generator.hpp#L43>:
> You are not handling alignment requirements of the allocator object here.
> Need to pad to an address that is a multiple of alignof(Alloc).
> Line 32,48: You also need to check for
> std::allocator_traits<Alloc>::is_always_equal
>
> The need for any_coroutine_handle (which stores the coroutine_handle and
> the promise pointer) and reinterpret_cast to get back to the 'value'
> data-member seems like a fairly unsafe way of implementing this.
> If you are storing the promise-pointer anyway, you could store a pointer
> to the base-class instead.
> e.g. see my generator<Ref, Val, use_allocator_arg> implementation
> <https://github.com/lewissbaker/generator/blob/e7c6c1c3009f0e9cf32add465b16f6295bfb9ac4/include/__generator.hpp#L789>
>
> Note that we cannot have coroutine_handle<void> return a pointer to the
> promise with the currently implemented coroutine ABIs because the offset
> between the frame pointer and the promise address is not necessarily a
> constant - there may be padding added to align a promise type with
> alignment >= 2*sizeof(void*). We would either need to change the ABI so
> that the padding was placed before the function-pointers, or we'd need to
> store an extra field to record the offset of the promise so we can compute
> its address correctly.
>
> I would also worry about the void* obtained from the
> coroutine_handle<void>::promise() being used in an unsafe way
> - it could be confused with address()/from_address()
> - you need to use reinterpret_cast<> which is difficult to get right
> without invoking UB
> These are minor concerns, however.
>
> - Lewis.
>
>
Received on 2022-09-04 06:16:52