std-proposals: Re: Resolving Static Initialization Order Fiasco by standardizing `weak` and `naked` symbols

From: Thiago Macieira <thiago_at_[hidden]>
Date: Thu, 06 Jun 2019 16:17:51 -0700

On Wednesday, 5 June 2019 14:00:54 PDT Omer Rosler via Std-Proposals wrote:
> Hello,
> This post is supposed to be a motivation for standardized `weak` and
> `naked` attributes.
> If this motivation deemed worthy, I will write a proposal.
> Note I am not a compiler/linker expert and I have no idea on the
> implementability of the proposed feature and will need help.
>
> I want to start a discussion both on `weak` and `naked` as well as the
> resolution to SIOF using the described construct.
>
> *Terminology*
>
> - a definition is said to be weak if we could provide another definition
> of the same entity, and if another non weak definition is given, the non
> weak definition is used by the linker.

This is the definition the linkers use.

> - a definition is said to be naked if we could provide more naked
> definitions of the same entity and the linker will choose one of them.

As in any of them? How is this different from a weak symbol in the absence of
a non-weak one?

Note that GNU binutils' nm shows both W and V for weak symbols: the V is used
for objects and W for everything else.

> Note that most vendors support these in some way or another (because it is
> useful, but the behavior is slightly different in each one.
> This is why the standard exists - to standardize common practice!

Can you point to existing practice of naked symbols? It's not a term I've
heard before.

> *Single Translation Unit*
> There are two parts of resolving this per translation unit.
> 1. Detecting the "correct" order which is the tool's job and not the
> standard:
> This is a hard problem, but it seems this can be solved (in a single
> TU) assuming there aren't many globals defined in a single TU (unless there
> are many inline variables in the codebase, we're probably fine).
>
> 2. Apply a code transformation on the globals that guarantees their
> initialization order.

I would say that for a single TU, there's nothing to be solved, since the
initialisation order is already well-defined: it is the order in which the
variables have been defined. In the absence of a forward declaration and of
weak symbols, a variable B can only refer to variable A if it has been
previously defined. That is the order in which they will be initialised.

So this is entirely under the programmer's control. Why should we need to fix
anything?

> //Filesystem.h
> #include "MemoryDevice.h"
> struct Filesystem {
> Filesystem(MemoryDevice& memory) {}};
> //declare all global variables in a struct for assured order of
> initialization
> template<>struct globals_t<"Filesystem"> {
> ProtectedMemoryDevice protected_block_device;
> Filesystem protected_fs;
> MemoryDevice unprotected_block_device;
> Filesystem unprotected_fs;
> //initialization code goes here
> globals_t(): protected_block_device(),
> protected_fs(protected_block_device),
> unprotected_block_device(),
> unprotected_fs(unprotected_block_device) {}
> };
>
>
> //Add legacy definitions for the global variables, which are for TUs
> depending on this one, which were not refactored yet
>
> extern ProtectedMemoryDevice& [[deprecated("Use
> globals().protected_block_device")]] protected_block_device;
> extern Filesystem& [[deprecated("Use globals().protected_fs")]]
> protected_fs; extern MemoryDevice& [[deprecated("Use
> globals().unprotected_block_device")]] unprotected_block_device;
> extern Filesystem& [[deprecated("Use globals().unprotected_fs")]]
> unprotected_fs;

Sorry, your code is mangled and is missing at least one closing brace, which
makes the interpretation ambiguous. You forgot to close the Filesystem struct,
which meant that the variables were members, not globals. But I guess you
meant that they were globals, due to the use of "extern" (which is not allowed
in a class body).

But I dispute the comment where you say "declare all global variables in a
struct for assured order of initialization". The order of initialisation is
*already* well-defined if you put all variables in the same TU. The order in
which they will be initialised is the order you declare them, which is exactly
the same as if they were non-static members of a struct (which we can call
"globals_t").

The only reason I would see for putting them inside the same struct, as
opposed to namespace scope, is to pack them in memory. As namespace variables,
the linker is allowed to organise them as it sees fit. This has no bearing in
the order in which they will be initialised, but could have impact in cache
use or collision.

> Overriding the weak definition of `globals<"Filesystem">` enables control
> of the order of initializations across translation units.
> Now why the virtual inheritance and naked definitions: to resolve diamond
> dependencies "automagically".
> Assume Four TUs: A, B, C, D
> D depends on B, C
> B and C both depend on A

Please explain how this works in a non-diamond, open hierarchy. For example,
your exact example but with only A, B, and C (no D).

> *Advantages*
>
> - It solves the static initialization order fiasco
> - It has zero runtime overhead
> - It can be automated
> - Every access to global variables would look like `globals().var`- i.e.
> the reader (both human and compiler) could detect `pure`ness very simply.
> - We get static constructors/destructors for free: put them in the body of
> the ctor of `globals_t`

There could be a much simpler solution: a way to tell compilers that TU A
depends on globals from TU B, so the linker should sort the initialiser
functions such that all of A's initialisations happen before B's begin.

The problem with this is that it requires introducing into the standard a
means of one TU referring to another. This might actually be the trickiest
part of that proposal.

Have you looked into an automated way of doing this? The compiler knows which
symbols required dynamic initialisation at load time, so it could leave a
marker for the linker. In turn, the linker knows which TUs used those symbols.
Can it not sort the initialisations by itself?

>
> *Disadvantages*
>
> - It does not solve the static *de*initialization order fiasco, but
> forces an order instead of an undefined one (which is better IMO). But
> this is a breaking change for a code base

The de-initialisation order is always the reverse of the initialisation one.

> *Proposals*
> This is comprised of three parts
> 1. [Core] Add `weak` and `naked` context sensitive keywords (they have
> observable effect on the program, therefore cannot be attributes according
> to the guidelines).
> The behaviour of those keyword in the standard would be defined by
> exceptions to the ODR rule.

I do support the idea for a weak attribute or language construct. Though I
have to say it doesn't happen often enough that I've needed it in the
standard. Using compiler extensions have been fine so far. As for naked, see
above.

Anyway, they are not ODR violations any more than inline functions and
variables (and variables inside inline functions) are. They are defined
exactly the same way in all TUs that use them. Violate this and you find
yourself in UB-land. The only difference is that you can have multiple,
identical copies in as many TUs.

> 2. [Library] Add to the standard library the needed ingredients for this
> construct.

I'm not convinced that your library solution solves anything that wasn't
already solved. So please continue iterating.

> 3. [SG15] Some description of the code transformation of the tool (I'm not
> sure what can actually be proposed there).

What tool?

> The motivation to put the library in the standard as well is that in the
> (optimistic, non realistic) future where the access to globals everywhere
> is via `lib::globals()` this "lib" should very definitely be "std"
> especially as this library would be a very complex template machine which
> might need compiler hooks for optimized implementation.

If we're going to access globals via functions, then the problem is
practically solved anyway. Instead of using namespace-scope globals, use
function-level statics, which are initialised on first use and deinitialised
in reverse order. So long as you don't have a dependency cycle, there's never
an ordering problem, at least for initialisation.

Destruction often has, but this is solved by coding practices and doing as
little as possible in destructors.

As an added benefit, you have lazy construction: no object is initialised
until and only if it is actually used by something. In your example, the
logger would not initialise unless something actually tried to log.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel System Software Products
-- 
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe_at_isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/6033552.AoyT28tNbl%40tjmaciei-mobl1.

Received on 2019-06-06 18:20:21