Hello, all.

Coroutines are a powerful tool for async programming in C++20, but currently, they don't integrate well with some programming patterns that assume that control flow for a function always stays on the same thread.

One major use-case here is thread-locals. Sometimes thread-locals are used to adapt older single-threaded APIs to support multiple contexts in multithreaded environments; other times they're used to pass additional context to an existing API that didn't previously take any.

The example use-case I'll use here is logging.
Suppose our program has a logging function, say with signature "WriteLog(LogLevel level, const char* format, va_list va)", that writes a line to a program-global log file (or possibly some other log output destination). It might accompany the caller-passed message with some basic data like the current date and time, the ID of the thread writing the log message, and so on.

If we wanted to provide more context to those log lines, we could simply prefix the context data into the format string, but doing so consistently across many call sites in an area of code can be cumbersome, particularly if the context we wish to add is dynamic. If the context is computed several stack frames up from the log call, this may require substantial API changes to support. So instead, the developer may turn to thread-local storage as a simpler way to provide this functionality.

To implement this functionality using thread-local storage, the programmer might create a thread_local std::list<std::string> (or other iterable container) to store the context stack for the current thread, provide PushContext(std::string context) and PopContext() methods to manage the stack, and automatically add the current context to log lines. This of course begs for an RAII ScopedLogContext class, which pushes context objects onto the stack at construction and pops them off at destruction. This works well in threaded code.

However, this technique immediately runs into problems with coroutines. For instance, take this function:

SomeAwaitableType SomeFunction(Arg arg)
{
  ScopedLogContext logCtx("SomeFunction", arg.someContextMember);
  WriteLog("Some Log Line %i", arg.someOtherMember);
  auto ret = co_await SomeOtherFunction(arg);
  WriteLog("Got ret: %i", ret.someMember);
  co_return DoSomePostProcessing(ret);
}

This code looks reasonable, but in reality, logCtx will outlive control flow exiting SomeFunction if the co_await suspends! This means that the context data it pushed onto the thread-local stack will still be there in the calling function, and that it likely won't be there when the second WriteLog call is made, or when logCtx goes out of scope and its destructor attempts to pop context.

Currently, it's possible to address these kinds of problems by making use of the promise type's initial_suspendfinal_suspend, and await_transform methods, but only if the developer has control over the coroutine_traits for SomeAwaitableType. Doing so is also fairly complex, particularly since you'd need to implement a whole wrapper Awaiter, and performing the operator co_await process is awkward.

My proposed solution:
I'm proposing a new customization point to help developers integrate thread-locals with coroutines cleanly and without requiring invasive changes to their awaitable types. This doesn't completely solve these problems, but they make it easy for a developer to work around them fairly painlessly.

The customization point is the new struct std::coroutine_state_traits<LocalObjectType>, which may provide an aux_state_type member. If this member exists for any data member that will be in-scope during any co_await, then it is instantiated as part of the coroutine state before initial_suspend is called, using its default constructor. If multiple such state objects are created, their constructors are run in the order the corresponding local variables are declared in the coroutine.

The aux_state_type struct has 2 methods, each taking no arguments and returning void.

The state objects' suspending methods are called before an awaiter's await_suspend in the reverse of their order of construction.
The state objects' resuming methods are called after an awaiter's await_resume in their order of construction.

The state objects' destructors are called after local variables within the coroutine are destroyed, before the promise's final_suspend method is called, in the reverse of their order of construction.

Using this mechanism, our ScopedLogContext type could provide an aux_state_type which stores its own std::list<std::string>. On construction, it would copy the thread-local stack, and on destruction it would move-assign its internal stack over the global one. The suspending and resuming methods could both std::swap the internal stack with the global one. This ensures that the thread-local state will be the same after the coroutine is suspended or exits as is was before it was started or resumed, and vice versa.

I'm not married to this specific implementation, but I hope I've gotten across the use-case for and benefit of providing this functionality.

Tangential additional STL improvement:
As mentioned above, it's currently somewhat awkward to implement a wrapper awaiter to return via await_transform. This could be solved by providing that functionality in the STL:

template<class T> decltype(autostd::get_awaiter(T&& obj)

This would return the awaiter o for argument a = obj, obtained as described in C++20 7.6.2.3 expr.await#3.3.

Thanks for reading, and apologies for the massive wall of poorly-formatted text!

--Ridley Combs