ISOCPP std-proposals List: Re: [std-proposals] Do not accept std::generator, please.

From: Nikl Kelbon <kelbonage_at_[hidden]>
Date: Sun, 5 Oct 2025 11:10:06 +0500

A lot has changed since my last message in this discussion, and I must
admit that back then, my arguments were not entirely convincing — and in
some points, even mistaken.

For example, I was wrong about point 2: recursive generators are indeed
useful and necessary. Three years ago, I couldn’t find any source that
clearly explained their purpose.

Regarding allocators — I must acknowledge that, despite most
implementations failing to handle them correctly, Lewis Baker’s generator
implementation does everything right.

That said, it appears to be the only correct implementation besides my own:
https://github.com/kelbon/kelcoro

I have studied all coroutine libraries known to me — Boost.Cobalt,
async_simple (by the implementer of coroutines in Clang), cppcoro, and
countless others.
All of them either ignore coroutine memory allocation altogether or
implement it inefficiently — for example, through polymorphic allocators or
something akin to std::any embedded directly inside the coroutine frame.
This is precisely what I would like to avoid in the standard
implementation, and I wanted to ensure that the C++ committee will take
care to make std::generator handle this correctly.

To me, this signals a deeper problem: the community doesn’t quite
understand how to use coroutines properly, which suggests that the standard
needs to make this part of C++ simpler and clearer.

For example, my implementation introduces a tag called
enable_resource_deduction, which allows allocator support to be added to
*any* coroutine simply by adding that tag.

This greatly simplifies allocator support for coroutines while avoiding the
need for the coroutine or its promise type to depend on the allocator type.
There is also a solution for the case where one wants to use a
default-constructed allocator without changing the coroutine signature —
but I won’t go into detail here to keep this message concise.

It seems to me that such a mechanism would be valuable to have in the C++
standard.

Additionally, my implementation uses the concept of a *memory resource (not
std one)* instead of a traditional allocator.
This reduces type requirements and enables implementations like this one:
https://github.com/kelbon/kelcoro/blob/main/include/kelcoro/stack_memory_resource.hpp

This particular resource requires that the last allocated object is always
the one being deallocated, which allows for the most efficient allocation
pattern for typical coroutine usage — where one coroutine cannot be
destroyed before another is resumed:

coro foo(resource) { co_await bar(resource); }

As for calling begin more than once — I’ll take that as a *feature* of my
generator :)
It allows for much more flexible use of generators beyond simple value
generation.
For example:
https://github.com/boostorg/json/issues/1102#issuecomment-3285365165

вс, 4 сент. 2022 г. в 16:25, Lewis Baker <lewissbaker_at_[hidden]>:

> On Sun, Sep 4, 2022 at 3:46 PM Nikl Kelbon <kelbonage_at_[hidden]> wrote:
>
>> > The design of std::generator allows an *optional* allocator type to be
>> specified.
>> It is possible in my realisation too. Default allocator is
>> std::allocator<std::byte>
>> In case when you want to not specify default constructible allocator you
>> can do
>>
>> generator<int> foo_coro(/*arguments with alloc*/)
>> generator<int> foo(/*arguments without alloc*/) {
>> return foo_coro(std::allocator_arg, Alloc{}, Args...);
>> }
>> In this case if allocator is empty compiler must optimize frame and do
>> not store allocator arg and alloc on frame
>> I dont see why it must be part of generator type and pollute the
>> interface + break vector<generator<X>> just for syntax sugar
>>
>
> What about the current design breaks vector<generator<X>>?
>
> I can write:
>
> std::vector<std::generator<int>> v;
> v.push_back([]() -> std::generator<int> { co_yield 42; }());
> v.push_back([](std::allocator_arg_t, SomeAlloc) { co_yield 42;
> }(std::allocator_arg, {}));
>
> It's possible we could add an implicit/explicit cast from generator<Ref,
> Val, Alloc> to generator<Ref, Val, void>.
>
> The point of the Alloc template arg was to make it more ergonomic to write
> coroutine functions that use custom stateless allocators without having to
> pollute the signature of the functions with allocator_arg arguments.
>
> > There is no extra type-erasure needed for allocators.
>> > Coroutines are themselves inherently type-erased
>>
>> I know that coroutines is a type erasure, but it is not clear that
>> implementation will not use additional type erasure and why Alloc = void
>> called 'type erasure case'
>>
>
> It's called the type-erasure case because the allocator is type-erased.
> The allocator can be type-erased without additional type-erasure above
> what the coroutine provides.
> It is only the deallocation that is type-erased. The allocation happens
> with static knowledge of the allocator type.
> Whether or not the implementation chooses to do additional type-erasure or
> takes advantage of the coroutine's inherent type-erasure is QoI.
>
>
>> > There is indeed some overhead for supporting the recursive generator
>> use-case
>>
>> This is why i say "Pay for what you dont use"
>>
>
> You are free to use a different generator type if you don't want the
> overhead of the recursive-generator type.
>
> If you think that we should also add a non-recursive generator type to the
> standard library then I'm sure that Library Evolution would welcome a paper
> on that.
>
>
>> > Current compilers generally struggle to inline/optimise operator++() as
>> this necessarily needs an extra level of indirection to be able to resume
>> different coroutines depending on what is the current leaf coroutine
>>
>> Because it is recursive generator. Is there a benchmarks where just
>> generator will lose perfomance compare to recusrive generator?
>>
>
> See "Performance & Benchmarks" section of P2168R3
> <https://wg21.link/P2168R3>
>
>
>> Recursive generator has overhead even when compiler optimized operator++,
>> we want to create many generators, use them like fast-write output / input
>> iterators etc etc
>>
>
> There is _some_ overhead, yes. Is it significant?
>
>
>> Let's image that recursive generator in recursive case 10% better then
>> just generator, but recursive generator used in 1% use cases when just
>> generator needed in 99% cases. So we will have perfomance loss in general.
>>
>
> In cases where the generator can be completely inlined there is a
> significant win.
> Non-recursive generator is more likely to be inlined here.
>
> In cases where the generator is not inlined there is very little
> difference in performance between recursive and non-recursive (maybe 1-2%).
>
>
>> > Would you expect begin() to advance the iteration to the next element?
>> It can be just documented, that generator::begin uses resume, but why it
>> need to be UNDEFINED behavior?
>>
>
> If the generator is already at the end() then calling begin() and having
> that unconditionally call resume() would be UB.
> To guard against this you'd need to conditionally call resume(), adding a
> branch to begin().
> I think I recall some feedback during the review process that this branch
> was unnecessary.
>
>
>> I dont see any case when compiler/implementation will optimize something
>> using this ub. Why not impl defined eventually?
>>
>
> Making calling begin() multiple times have implementation-defined
> behaviour would make code that relies on that non-portable.
>
>
>> And what about case when generator created by move from other generator?
>> Will it be UB to use begin first time for this generator?
>>
>
> Generally, you are not allowed to move a view/range after you start
> iterating over it.
>
> If you have not started iterating over the generator, move it, then call
> begin() - that should be fine.
>
>
>>
>> for(auto& x : gen) {
>> handle(x);
>> if(token.stop_requested())
>> break;
>> }
>> ... // time to continue
>> for(auto& x :gen) // UB
>>
>
> Yes, if you are writing a generic algorithm across input_range objects
> then attempting to iterate over them twice like this would be UB for at
> least some types that model input_range.
> I don't think input_range guarantees that you can call begin() multiple
> times, only forward_range or stronger gives you this.
>
> If you are going to try to continue iterating over an input_range in
> different for-loops then just use an iterator and do the iteration manually.
>
>
>> > You are not handling alignment requirements of the allocator object
>> here. Need to pad to an address that is a multiple of alignof(Alloc).
>>
>> I know and i leave a big comment about it. I dont really know why
>> operator new/delete used, why not just "allocate_frame" function like
>> await_suspend/await_resume/get_return_object?
>>
>
> I guess it may allow some reuse of wording around allocation/deallocation
> functions but otherwise there's no reason why it needed to be spelled
> 'operator new'.
>
>
>> Its really logically incorrect, this operator new used for corotuine
>> frame, but defined for promise type.
>>
>
> Yes. This is a known inconsistency.
>
>
>> And i dont see any guarantees about frame alignment, what alignment i
>> must use?)
>>
>
> Generally, the caller of an allocation function should be able to
> guarantee that it can construct any object of the requested size where that
> object does not have new-extended-alignment.
> So if it asks to allocate >=8 bytes then generally it should be 8 byte
> aligned.
> If it asks for >=16 bytes then it should be 16 byte aligned, etc.
> Up to __STDCPP_DEFAULT_NEW_ALIGNMENT__ which is the maximum alignment that
> the default operator new should be assumed to provide.
> Above this alignment, implementations are supposed to call the align_val_t
> overloads to obtain memory with greater alignment.
> However coroutines do not currently support dispatching to align_val_t
> overloads.
> So the compiler can generally only assume at most
> __STDCPP_DEFAULT_NEW_ALIGNMENT__ and if the frame requires greater
> alignment then they over-allocate and dynamically align within the returned
> buffer.
>
>
>> Why i cant use alloc which user provided and it is stored on frame? It is
>> overhead.
>>
>
> The problem here is that during deallocation the deallocation function
> called _after_ the parameter-copies stored in that allocated frame are
> destroyed.
> So you need to store an additional copy of a stateful allocator outside of
> the frame so that it's still available at the time when the deallocation
> function is called.
>
>
>> And main problem - why ALLOCATOR support required?
>>
>
> Lots of code-bases have existing custom-allocators that they have
> implemented and they want to be able to reuse them to customise allocations
> of things.
> It is useful to be able to reuse them to customise allocation of generator
> coroutine frames.
>
>
>> When it must be memory resource support. In this case it must be exactly
>> one non typed allocation. Use allocator for this is too much.
>>
>
> The reference implementation by Casey rebinds the allocator to a trivial
> type that has size/alignment of __STDCPP_DEFAULT_NEW_ALIGNMENT__ and then
> uses the allocator to allocate an array of those objects big enough to host
> the requested size.
>
>
>> What about some std:: type which will store this logic like "Im exactly
>> one allocation and deallocation from this object X, it can be memory
>> resource or allocator, and im a tag-type like std::allocator_arg_t"
>>
>
> I'm really not following what you're suggesting here.
>
>
>> generator<int> x(std::one_allocation_from<Alloc>);
>>
>> > If you are storing the promise-pointer anyway, you could store a
>> pointer to the base-class instead.
>> i dont see why std::coroutine_handle<void>::promise is not exist.
>>
>
>
>> MSVC implementation dont use nothing from promise type to get
>> promise(just reinterpret_cast), llvm implementation uses alignment, but why
>> dont provide alignment in promise method?
>>
>
> The MSVC ABI places the alignment padding before the stable-ABI part
> (resume-fn/destroy-fn pointers) which allows it to place the promise
> immediately after the function-pointers.
> The clang implementation places the alignment padding between the
> function-pointers and promise and so we need to know the promise alignment
> in order to calculate the address of the promise given the address of the
> function-pointers.
> The clang implementation could be changed to use the same ABI as MSVC - in
> which case it would be easier to implement a
> coroutine_handle<void>::promise() that returned the address of the promise.
>
> A paper would need to be written proposing to add this to
> coroutine_handle<void> and implementations for Clang and GCC would need to
> be changed so that their ABI allowed calculation of the promise address
> without knowing the promise-type.
>
>
>> > currently implemented coroutine ABIs
>> I dont see how to do not change ABI with coroutines. Changing code in
>> coroutine may change ABI, optimization in corotuine frame may change ABI
>>
>
> It is only the ABI-stable portion that needs to be considered here. ie.
> the ABI of the coroutine that coroutine_handle interacts with.
> The rest of the layout of the frame is type-erased and is handled by the
> relevant resume/destroy function pointers.
> Changing the body of a coroutine function does not change the ABI of the
> coroutine-frame.
>
> It is not an ABI break for those other aspects of the layout to change
> since things in other translation units will only ever interact with them
> through the coroutine_handle type, which only interacts with the ABI-stable
> part of the coroutine-frame.
>
>
>>
>> вс, 4 сент. 2022 г. в 06:07, Lewis Baker <lewissbaker_at_[hidden]>:
>>
>>>
>>>
>>> On Sat, Sep 3, 2022 at 6:19 PM Nikl Kelbon via Std-Proposals <
>>> std-proposals_at_[hidden]> wrote:
>>>
>>>> std::generator has huge problems.
>>>> I will briefly list them:
>>>> 1. The generator must not depend on the allocator (only the
>>>> promise_type depends), but std::generator has a template argument for
>>>> alloc, in particular, this prohibits adding 2 generators that allocated
>>>> memory differently into a container. It is possible to implement this
>>>> without type erasure.
>>>> Link to example implementation:
>>>>
>>>> https://github.com/kelbon/dd_generator/blob/4053fcff456c885d9330708d2e67f103c4212319/include/dd_generator.hpp#L264
>>>>
>>>
>>> The design of std::generator allows an *optional* allocator type to be
>>> specified.
>>>
>>> If you do not specify it then the allocator is determined solely by the
>>> arguments.
>>> If there is a std::allocator_arg + allocator object pair passed as the
>>> first two arguments of the coroutine then it will use that.
>>> Otherwise, it will fall back to using std::allocator by default.
>>> In this way it is possible to create a container of std::generator
>>> objects that have different allocator types by just not specifying the
>>> allocator in the template argument.
>>>
>>> If the template argument is specified then either:
>>> - the allocator is stateless in which case it uses a default-constructed
>>> allocator and avoids the user needing to explictly specify the allocator
>>> - the allocator is stateful and the caller still needs to pass an
>>> instance of the allocator using std::allocator_arg
>>>
>>> The main motivation for allowing the allocator as a template argument to
>>> std::generator is to allow the stateless-allocator use-case to be more
>>> ergonomic.
>>> e.g. so we can define an alias
>>>
>>> template<typename Ref, typename Val = remove_cvref_t<Ref>>
>>> using my_generator = std:::generator<Ref, Val, my_stateless_allocator>;
>>>
>>> And then can just use my_generator throughout my program and it will
>>> just use the custom allocator without having to worry about passing
>>> std::allocator_arg to every coroutine function.
>>>
>>> There is no additional type-erasure necessary for supporting these
>>> use-cases.
>>> We can just leverage the inherent type-erasure of coroutines in general
>>> to handle this.
>>> e.g. see
>>> https://github.com/lewissbaker/generator/blob/main/include/__generator.hpp#L277-L330
>>>
>>> 2. Overhead for ueslss default type erasure(default behavior) and stack
>>>> in generator (used only for element_of, which is basically for(auto&& v :
>>>> gen) co_yield v;
>>>> Its just uselss
>>>> You pay for what you dont use.
>>>>
>>>
>>> What "useless default type erasure" are you referring to here?
>>> There is no extra type-erasure needed for allocators.
>>> Coroutines are themselves inherently type-erased (which we can take
>>> advantage of).
>>>
>>> There is indeed some overhead for supporting the recursive generator
>>> use-case.
>>> Current compilers generally struggle to inline/optimise operator++() as
>>> this necessarily needs an extra level of indirection to be able to resume
>>> different coroutines depending on what is the current leaf coroutine.
>>>
>>> The recursive generator is more efficient for recursive use-cases but
>>> less efficient for non-recursive use-cases.
>>> Supporting 'co_yield elements_of(x)' is not possible without the
>>> recursive support.
>>>
>>> We discussed whether it would be better to add a non-recursive generator
>>> or a recursive generator.
>>> I think the consensus was that it was much harder to write your own
>>> recursive generator and so that had more value to exist in the standard
>>> library.
>>> We can add another non-recursive generator type in future if compilers
>>> are unable to optimise the recursive generator sufficiently.
>>>
>>>
>>>> 3. It is undefined behavior to call begin for std::generator twice. But
>>>> iteraing not all values in one loop is a very logical action and expected
>>>> behavior - just keep generating values
>>>> Besides, such an error is extremely non-obvious when using ranges or
>>>> range based for loop.
>>>>
>>>
>>> This is a typical restriction provided by many other input_range
>>> implementations.
>>>
>>> Would you expect begin() to advance the iteration to the next element?
>>> Or would you expect to be able to call begin() multiple times and each
>>> time just get a copy of the iterator pointing to the same element?
>>> Supporting either of these would potentially affect performance - either
>>> requiring more state to be held in the generator to keep track of whether
>>> this was the first call to begin() or not, or require an extra indirection
>>> when starting the coroutine for the first time (since a subsequent call to
>>> begin() might need to resume a different coroutine than the top-level
>>> coroutine).
>>>
>>>
>>>> 4. interface that is very obscure to the user and requires knowledge of
>>>> implementation to write efficient code
>>>> No one will write generator<const T&> foo(); without knowing why it is
>>>> needed and why it is better for perfmornace
>>>>
>>>
>>> Can you please expand on what part of this you think is "obscure"?
>>> The user needs to make a choice about whether they want to yield a
>>> sequence of lvalues, const-lvalues, rvalues.
>>> This will affect what kinds of values they can pass to co_yield and
>>> whether they incur extra copies/etc.
>>>
>>>
>>>> But there are way to write generator without such problems
>>>> https://github.com/kelbon/dd_generator
>>>>
>>>
>>> Line 43
>>> <https://github.com/kelbon/dd_generator/blob/9e33ee704444cc7f8d36f67c7358c9d19e770a0d/include/dd_generator.hpp#L43>:
>>> You are not handling alignment requirements of the allocator object here.
>>> Need to pad to an address that is a multiple of alignof(Alloc).
>>> Line 32,48: You also need to check for
>>> std::allocator_traits<Alloc>::is_always_equal
>>>
>>> The need for any_coroutine_handle (which stores the coroutine_handle and
>>> the promise pointer) and reinterpret_cast to get back to the 'value'
>>> data-member seems like a fairly unsafe way of implementing this.
>>> If you are storing the promise-pointer anyway, you could store a pointer
>>> to the base-class instead.
>>> e.g. see my generator<Ref, Val, use_allocator_arg> implementation
>>> <https://github.com/lewissbaker/generator/blob/e7c6c1c3009f0e9cf32add465b16f6295bfb9ac4/include/__generator.hpp#L789>
>>>
>>> Note that we cannot have coroutine_handle<void> return a pointer to the
>>> promise with the currently implemented coroutine ABIs because the offset
>>> between the frame pointer and the promise address is not necessarily a
>>> constant - there may be padding added to align a promise type with
>>> alignment >= 2*sizeof(void*). We would either need to change the ABI so
>>> that the padding was placed before the function-pointers, or we'd need to
>>> store an extra field to record the offset of the promise so we can compute
>>> its address correctly.
>>>
>>> I would also worry about the void* obtained from the
>>> coroutine_handle<void>::promise() being used in an unsafe way
>>> - it could be confused with address()/from_address()
>>> - you need to use reinterpret_cast<> which is difficult to get right
>>> without invoking UB
>>> These are minor concerns, however.
>>>
>>> - Lewis.
>>>
>>>

Received on 2025-10-05 06:10:20