Document number:   PxxxxR0
Date:   2021-07-13
Audience:   SG14

Opt-out thread_local lazy initialization

This document proposes a new mechanism to guaranteeably opt-out of lazy initialization in the general case for thread_local variables in critical paths. This should include support for at least namespace scope and class member variables and also potentially local scope variables that can be statically determined to not rely upon function arguments for initialization.

Under this proposal a declaration or initialization uses a new prepended attribute to opt into eager initialization:

struct A {
  A() { }
  int x{-1};
};

[[eager_init]] thread_local A a1;

All declarations of a given namespace [[eager_init]] thread_local variable must be identically attributed as [[eager_init]] in order to eliminate ambiguity over initialization semantics, and compilation unit symbol naming will need augmentation to accommodate.

1. Overview

The standard states in [basic.stc.thread]/1 that:

All variables declared with the thread_local keyword have thread storage duration. The storage for these entities lasts for the duration of the thread in which they are created. There is a distinct object or reference per thread, and use of the declared name refers to the entity associated with the current thread.

The language guarantees variables will be available before they are needed and makes no rule whether unused variables must, may, or must not be created.

In practice, major compilers have consistently opted deferred initialization of thread_local variables requiring dynamic initialization This has been generally acceptable for most applications and is in fact likely drastically better overall than the alternative of defaulting to eager initialization in every thread.

2. Motivation

The costs of lazy TLS initialization in critical paths of highly latency sensitive systems can be a frustration.

The additional generated branches at every access entry point can largely be correctly predicted (nearly) "cost free" as not-taken jumps not using branch target buffer slots, but predictors are imperfect and can incur misses due to aliasing, and code density suffers a small penalty too.

Compiler-specific (e.g., GCC's __thread) mechanisms not supporting dynamic initialization without lazy initialization checks exist, and these can be combined with various thread-lifetime setup/teardown method registration and dispatch frameworks, but they are not clean and fragile in the face of application users creating threads escaping the introduced frameworks.

Thread-local variables are of particular importance in logging and instrumentation frameworks, where added costs for lazy-init checks are even less welcome.

3. Design rationale

3.1. What needs to be enforced

  1. All extern declarations of thread_local variables must have matching presence/absence of the [[eager_init]] attribute.
  2. Some form of symbol mangling to distinguish [[eager_init]] thread_local variables will need to be created in implementations in order to enforce this.

3.1. Interfaces not being considered

4. Open Issues

4.1. Local scope support

Automatic variables that can be statically determined to not rely upon function arguments for initialization may be eligible for similar opt-int lazy-init bypass.