C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Do not accept std::generator, please.

From: Lewis Baker <lewissbaker_at_[hidden]>
Date: Sun, 4 Sep 2022 20:55:13 +0930
On Sun, Sep 4, 2022 at 3:46 PM Nikl Kelbon <kelbonage_at_[hidden]> wrote:

> > The design of std::generator allows an *optional* allocator type to be
> specified.
> It is possible in my realisation too. Default allocator is
> std::allocator<std::byte>
> In case when you want to not specify default constructible allocator you
> can do
>
> generator<int> foo_coro(/*arguments with alloc*/)
> generator<int> foo(/*arguments without alloc*/) {
> return foo_coro(std::allocator_arg, Alloc{}, Args...);
> }
> In this case if allocator is empty compiler must optimize frame and do not
> store allocator arg and alloc on frame
> I dont see why it must be part of generator type and pollute the
> interface + break vector<generator<X>> just for syntax sugar
>

What about the current design breaks vector<generator<X>>?

I can write:

std::vector<std::generator<int>> v;
v.push_back([]() -> std::generator<int> { co_yield 42; }());
v.push_back([](std::allocator_arg_t, SomeAlloc) { co_yield 42;
}(std::allocator_arg, {}));

It's possible we could add an implicit/explicit cast from generator<Ref,
Val, Alloc> to generator<Ref, Val, void>.

The point of the Alloc template arg was to make it more ergonomic to write
coroutine functions that use custom stateless allocators without having to
pollute the signature of the functions with allocator_arg arguments.

> There is no extra type-erasure needed for allocators.
> > Coroutines are themselves inherently type-erased
>
> I know that coroutines is a type erasure, but it is not clear that
> implementation will not use additional type erasure and why Alloc = void
> called 'type erasure case'
>

It's called the type-erasure case because the allocator is type-erased.
The allocator can be type-erased without additional type-erasure above what
the coroutine provides.
It is only the deallocation that is type-erased. The allocation happens
with static knowledge of the allocator type.
Whether or not the implementation chooses to do additional type-erasure or
takes advantage of the coroutine's inherent type-erasure is QoI.


> > There is indeed some overhead for supporting the recursive generator
> use-case
>
> This is why i say "Pay for what you dont use"
>

You are free to use a different generator type if you don't want the
overhead of the recursive-generator type.

If you think that we should also add a non-recursive generator type to the
standard library then I'm sure that Library Evolution would welcome a paper
on that.


> > Current compilers generally struggle to inline/optimise operator++() as
> this necessarily needs an extra level of indirection to be able to resume
> different coroutines depending on what is the current leaf coroutine
>
> Because it is recursive generator. Is there a benchmarks where just
> generator will lose perfomance compare to recusrive generator?
>

See "Performance & Benchmarks" section of P2168R3
<https://wg21.link/P2168R3>


> Recursive generator has overhead even when compiler optimized operator++,
> we want to create many generators, use them like fast-write output / input
> iterators etc etc
>

There is _some_ overhead, yes. Is it significant?


> Let's image that recursive generator in recursive case 10% better then
> just generator, but recursive generator used in 1% use cases when just
> generator needed in 99% cases. So we will have perfomance loss in general.
>

In cases where the generator can be completely inlined there is a
significant win.
Non-recursive generator is more likely to be inlined here.

In cases where the generator is not inlined there is very little difference
in performance between recursive and non-recursive (maybe 1-2%).


> > Would you expect begin() to advance the iteration to the next element?
> It can be just documented, that generator::begin uses resume, but why it
> need to be UNDEFINED behavior?
>

If the generator is already at the end() then calling begin() and having
that unconditionally call resume() would be UB.
To guard against this you'd need to conditionally call resume(), adding a
branch to begin().
I think I recall some feedback during the review process that this branch
was unnecessary.


> I dont see any case when compiler/implementation will optimize something
> using this ub. Why not impl defined eventually?
>

Making calling begin() multiple times have implementation-defined behaviour
would make code that relies on that non-portable.


> And what about case when generator created by move from other generator?
> Will it be UB to use begin first time for this generator?
>

Generally, you are not allowed to move a view/range after you start
iterating over it.

If you have not started iterating over the generator, move it, then call
begin() - that should be fine.


>
> for(auto& x : gen) {
> handle(x);
> if(token.stop_requested())
> break;
> }
> ... // time to continue
> for(auto& x :gen) // UB
>

Yes, if you are writing a generic algorithm across input_range objects then
attempting to iterate over them twice like this would be UB for at least
some types that model input_range.
I don't think input_range guarantees that you can call begin() multiple
times, only forward_range or stronger gives you this.

If you are going to try to continue iterating over an input_range in
different for-loops then just use an iterator and do the iteration manually.


> > You are not handling alignment requirements of the allocator object
> here. Need to pad to an address that is a multiple of alignof(Alloc).
>
> I know and i leave a big comment about it. I dont really know why operator
> new/delete used, why not just "allocate_frame" function like
> await_suspend/await_resume/get_return_object?
>

I guess it may allow some reuse of wording around allocation/deallocation
functions but otherwise there's no reason why it needed to be spelled
'operator new'.


> Its really logically incorrect, this operator new used for corotuine
> frame, but defined for promise type.
>

Yes. This is a known inconsistency.


> And i dont see any guarantees about frame alignment, what alignment i must
> use?)
>

Generally, the caller of an allocation function should be able to guarantee
that it can construct any object of the requested size where that object
does not have new-extended-alignment.
So if it asks to allocate >=8 bytes then generally it should be 8 byte
aligned.
If it asks for >=16 bytes then it should be 16 byte aligned, etc.
Up to __STDCPP_DEFAULT_NEW_ALIGNMENT__ which is the maximum alignment that
the default operator new should be assumed to provide.
Above this alignment, implementations are supposed to call the align_val_t
overloads to obtain memory with greater alignment.
However coroutines do not currently support dispatching to align_val_t
overloads.
So the compiler can generally only assume at most
__STDCPP_DEFAULT_NEW_ALIGNMENT__ and if the frame requires greater
alignment then they over-allocate and dynamically align within the returned
buffer.


> Why i cant use alloc which user provided and it is stored on frame? It is
> overhead.
>

The problem here is that during deallocation the deallocation function
called _after_ the parameter-copies stored in that allocated frame are
destroyed.
So you need to store an additional copy of a stateful allocator outside of
the frame so that it's still available at the time when the deallocation
function is called.


> And main problem - why ALLOCATOR support required?
>

Lots of code-bases have existing custom-allocators that they have
implemented and they want to be able to reuse them to customise allocations
of things.
It is useful to be able to reuse them to customise allocation of generator
coroutine frames.


> When it must be memory resource support. In this case it must be exactly
> one non typed allocation. Use allocator for this is too much.
>

The reference implementation by Casey rebinds the allocator to a trivial
type that has size/alignment of __STDCPP_DEFAULT_NEW_ALIGNMENT__ and then
uses the allocator to allocate an array of those objects big enough to host
the requested size.


> What about some std:: type which will store this logic like "Im exactly
> one allocation and deallocation from this object X, it can be memory
> resource or allocator, and im a tag-type like std::allocator_arg_t"
>

I'm really not following what you're suggesting here.


> generator<int> x(std::one_allocation_from<Alloc>);
>
> > If you are storing the promise-pointer anyway, you could store a pointer
> to the base-class instead.
> i dont see why std::coroutine_handle<void>::promise is not exist.
>


> MSVC implementation dont use nothing from promise type to get promise(just
> reinterpret_cast), llvm implementation uses alignment, but why dont provide
> alignment in promise method?
>

The MSVC ABI places the alignment padding before the stable-ABI part
(resume-fn/destroy-fn pointers) which allows it to place the promise
immediately after the function-pointers.
The clang implementation places the alignment padding between the
function-pointers and promise and so we need to know the promise alignment
in order to calculate the address of the promise given the address of the
function-pointers.
The clang implementation could be changed to use the same ABI as MSVC - in
which case it would be easier to implement a
coroutine_handle<void>::promise() that returned the address of the promise.

A paper would need to be written proposing to add this to
coroutine_handle<void> and implementations for Clang and GCC would need to
be changed so that their ABI allowed calculation of the promise address
without knowing the promise-type.


> > currently implemented coroutine ABIs
> I dont see how to do not change ABI with coroutines. Changing code in
> coroutine may change ABI, optimization in corotuine frame may change ABI
>

It is only the ABI-stable portion that needs to be considered here. ie. the
ABI of the coroutine that coroutine_handle interacts with.
The rest of the layout of the frame is type-erased and is handled by the
relevant resume/destroy function pointers.
Changing the body of a coroutine function does not change the ABI of the
coroutine-frame.

It is not an ABI break for those other aspects of the layout to change
since things in other translation units will only ever interact with them
through the coroutine_handle type, which only interacts with the ABI-stable
part of the coroutine-frame.


>
> вс, 4 сент. 2022 г. в 06:07, Lewis Baker <lewissbaker_at_[hidden]>:
>
>>
>>
>> On Sat, Sep 3, 2022 at 6:19 PM Nikl Kelbon via Std-Proposals <
>> std-proposals_at_[hidden]> wrote:
>>
>>> std::generator has huge problems.
>>> I will briefly list them:
>>> 1. The generator must not depend on the allocator (only the promise_type
>>> depends), but std::generator has a template argument for alloc, in
>>> particular, this prohibits adding 2 generators that allocated memory
>>> differently into a container. It is possible to implement this without type
>>> erasure.
>>> Link to example implementation:
>>>
>>> https://github.com/kelbon/dd_generator/blob/4053fcff456c885d9330708d2e67f103c4212319/include/dd_generator.hpp#L264
>>>
>>
>> The design of std::generator allows an *optional* allocator type to be
>> specified.
>>
>> If you do not specify it then the allocator is determined solely by the
>> arguments.
>> If there is a std::allocator_arg + allocator object pair passed as the
>> first two arguments of the coroutine then it will use that.
>> Otherwise, it will fall back to using std::allocator by default.
>> In this way it is possible to create a container of std::generator
>> objects that have different allocator types by just not specifying the
>> allocator in the template argument.
>>
>> If the template argument is specified then either:
>> - the allocator is stateless in which case it uses a default-constructed
>> allocator and avoids the user needing to explictly specify the allocator
>> - the allocator is stateful and the caller still needs to pass an
>> instance of the allocator using std::allocator_arg
>>
>> The main motivation for allowing the allocator as a template argument to
>> std::generator is to allow the stateless-allocator use-case to be more
>> ergonomic.
>> e.g. so we can define an alias
>>
>> template<typename Ref, typename Val = remove_cvref_t<Ref>>
>> using my_generator = std:::generator<Ref, Val, my_stateless_allocator>;
>>
>> And then can just use my_generator throughout my program and it will just
>> use the custom allocator without having to worry about passing
>> std::allocator_arg to every coroutine function.
>>
>> There is no additional type-erasure necessary for supporting these
>> use-cases.
>> We can just leverage the inherent type-erasure of coroutines in general
>> to handle this.
>> e.g. see
>> https://github.com/lewissbaker/generator/blob/main/include/__generator.hpp#L277-L330
>>
>> 2. Overhead for ueslss default type erasure(default behavior) and stack
>>> in generator (used only for element_of, which is basically for(auto&& v :
>>> gen) co_yield v;
>>> Its just uselss
>>> You pay for what you dont use.
>>>
>>
>> What "useless default type erasure" are you referring to here?
>> There is no extra type-erasure needed for allocators.
>> Coroutines are themselves inherently type-erased (which we can take
>> advantage of).
>>
>> There is indeed some overhead for supporting the recursive generator
>> use-case.
>> Current compilers generally struggle to inline/optimise operator++() as
>> this necessarily needs an extra level of indirection to be able to resume
>> different coroutines depending on what is the current leaf coroutine.
>>
>> The recursive generator is more efficient for recursive use-cases but
>> less efficient for non-recursive use-cases.
>> Supporting 'co_yield elements_of(x)' is not possible without the
>> recursive support.
>>
>> We discussed whether it would be better to add a non-recursive generator
>> or a recursive generator.
>> I think the consensus was that it was much harder to write your own
>> recursive generator and so that had more value to exist in the standard
>> library.
>> We can add another non-recursive generator type in future if compilers
>> are unable to optimise the recursive generator sufficiently.
>>
>>
>>> 3. It is undefined behavior to call begin for std::generator twice. But
>>> iteraing not all values in one loop is a very logical action and expected
>>> behavior - just keep generating values
>>> Besides, such an error is extremely non-obvious when using ranges or
>>> range based for loop.
>>>
>>
>> This is a typical restriction provided by many other input_range
>> implementations.
>>
>> Would you expect begin() to advance the iteration to the next element?
>> Or would you expect to be able to call begin() multiple times and each
>> time just get a copy of the iterator pointing to the same element?
>> Supporting either of these would potentially affect performance - either
>> requiring more state to be held in the generator to keep track of whether
>> this was the first call to begin() or not, or require an extra indirection
>> when starting the coroutine for the first time (since a subsequent call to
>> begin() might need to resume a different coroutine than the top-level
>> coroutine).
>>
>>
>>> 4. interface that is very obscure to the user and requires knowledge of
>>> implementation to write efficient code
>>> No one will write generator<const T&> foo(); without knowing why it is
>>> needed and why it is better for perfmornace
>>>
>>
>> Can you please expand on what part of this you think is "obscure"?
>> The user needs to make a choice about whether they want to yield a
>> sequence of lvalues, const-lvalues, rvalues.
>> This will affect what kinds of values they can pass to co_yield and
>> whether they incur extra copies/etc.
>>
>>
>>> But there are way to write generator without such problems
>>> https://github.com/kelbon/dd_generator
>>>
>>
>> Line 43
>> <https://github.com/kelbon/dd_generator/blob/9e33ee704444cc7f8d36f67c7358c9d19e770a0d/include/dd_generator.hpp#L43>:
>> You are not handling alignment requirements of the allocator object here.
>> Need to pad to an address that is a multiple of alignof(Alloc).
>> Line 32,48: You also need to check for
>> std::allocator_traits<Alloc>::is_always_equal
>>
>> The need for any_coroutine_handle (which stores the coroutine_handle and
>> the promise pointer) and reinterpret_cast to get back to the 'value'
>> data-member seems like a fairly unsafe way of implementing this.
>> If you are storing the promise-pointer anyway, you could store a pointer
>> to the base-class instead.
>> e.g. see my generator<Ref, Val, use_allocator_arg> implementation
>> <https://github.com/lewissbaker/generator/blob/e7c6c1c3009f0e9cf32add465b16f6295bfb9ac4/include/__generator.hpp#L789>
>>
>> Note that we cannot have coroutine_handle<void> return a pointer to the
>> promise with the currently implemented coroutine ABIs because the offset
>> between the frame pointer and the promise address is not necessarily a
>> constant - there may be padding added to align a promise type with
>> alignment >= 2*sizeof(void*). We would either need to change the ABI so
>> that the padding was placed before the function-pointers, or we'd need to
>> store an extra field to record the offset of the promise so we can compute
>> its address correctly.
>>
>> I would also worry about the void* obtained from the
>> coroutine_handle<void>::promise() being used in an unsafe way
>> - it could be confused with address()/from_address()
>> - you need to use reinterpret_cast<> which is difficult to get right
>> without invoking UB
>> These are minor concerns, however.
>>
>> - Lewis.
>>
>>

Received on 2022-09-04 11:25:26