This is the resolution for 2, and originally I envisioned this as a library only solution, but it required several ODR violations that seem reasonable, but none of the major compiler implements - a combination of `weak` and `naked` attributes.
I will describe this library solution here, and it is based on the existence of `weak` and `naked` context sensitive keywords.
- Zero runtime overhead (no static reference counters like the Nifty Counter Idiom, which is sites as a solution to SIOF in the C++ FAQ)
- Define the deinitialization order (not necessarily resolve the deinitialization fiasco)
- The only required change to the content of TU is where we access the global variables.
- Works equally (ish) for static member variables.
Library Description
We assume the TU knows the correct order of initialization (we assume a tool analyzed this before the code transformation)
and so we only need to group the globals into a structure where the order of deinitialization is defined.
//lib's most basic API
template<char... TU>
struct globals_t {};
template<char... TU>
weak globals_t<TU>& globals() {
static globals_t<TU...> ret; //globals ctor is here in default
return ret;
}
inline auto& get_globals()
{
using return_t = globals_t<__SOME_COMPILER_SPECIFIC_MACRO_FOR_CURRENT_TU_NAME>;
static_assert(std::is_default_constructible<return_t>, "Your specialization is wrong");
return globals<__SOME_COMPILER_SPECIFIC_MACRO_FOR_CURRENT_TU_NAME>();
}
The TU can access the global variables after initialization using `get_globals` and the TU would define the globals by specializing `globals_t` and replacing the implementation of `globals()`
For scalability, every TU's `globals_t` must contain the the `global_t` structure of every dependency it has, but this means the definition of `globals<Base>` is wrong when compiling the dependent TU. This is where the `weak` definition comes in, we override the definition of the `globals()` function. For example:
//Filesystem.h
#include "MemoryDevice.h"
struct Filesystem {
Filesystem(MemoryDevice& memory) {}
};
//declare all global variables in a struct for assured order of initialization
template<>
struct globals_t<"Filesystem"> {
ProtectedMemoryDevice protected_block_device;
Filesystem protected_fs;
MemoryDevice unprotected_block_device;
Filesystem unprotected_fs;
//initialization code goes here
globals_t(): protected_block_device(),
protected_fs(protected_block_device),
unprotected_block_device(),
unprotected_fs(unprotected_block_device) {}
};
//Add legacy definitions for the global variables, which are for TUs depending on this one, which were not refactored yet
extern ProtectedMemoryDevice& [[deprecated("Use globals().protected_block_device")]] protected_block_device;
extern Filesystem& [[deprecated("Use globals().protected_fs")]] protected_fs;
extern MemoryDevice& [[deprecated("Use globals().unprotected_block_device")]] unprotected_block_device;
extern Filesystem& [[deprecated("Use globals().unprotected_fs")]] unprotected_fs;
//Filesystem.cpp
ProtectedMemoryDevice& protected_block_device = get_globals().protected_block_device;
Filesystem& protected_fs = get_globals().protected_fs;
MemoryDevice& unprotected_block_device = get_globals().unprotected_block_device;
Filesystem& unprotected_fs = get_globals().unprotected_fs;
//Logger.h
//`Logger` depends on Filesystem, therefore the globals of `Logger` must be initialized afterwards.
#include "Filesystem.h"
//define LogFile and LoggerTaskThread here
//the inheritance explained below
template<>
struct globals_t<"Logger"> : public virtual globals_t<"Filesystem">
{
LogFile log_file;
LoggerTaskThread logger_task;
globals_t(): log_file(this->protected_fs.open("log", "w")),
logger_task(log_file) {}
};
//legacy declerations here...
// override the weak default definition, the naked explained below
template<>
extern naked globals_t<"Filesystem">& globals();
//Logger.cpp
//legacy definitions here...
template<>
globals_t<"Filesystem">& globals() {
return static_cast<globals_t<"FileSystem">&>(globals<"Logger">());
}
Overriding the weak definition of `globals<"Filesystem">` enables control of the order of initializations across translation units.
Now why the virtual inheritance and naked definitions: to resolve diamond dependencies "automagically".
Assume Four TUs: A, B, C, D
D depends on B, C
B and C both depend on A
All have global variables, now if D's globals_t inherit from both B and C's globals_t, the overriden definitions of `globals<"A">()` return the same address, but without the `naked` specifier we'll have an ODR violation, even though we don't really care about it.
The scalability here is the fact that we only need to override the definitions of the direct dependencies's `globals()` function and provide legacy definitions for our own TU.
This is the main idea.
For static globals, we can put them in the `globals_t` struct as private members, and befriend everything dependent on it (this is one file only, so this is scalable).
Inline globals (or static globals in header files) can be handled with template metaprogramming.
Static members are the hardest to support:
As a static member's initalization may depend on a global variable's initialization and vice versa, we have to put their initialization inside the `globals_t` struct.
This complicates the whole thing slightly as now we need to support access specifiers (`protected` is the hardest to support in a scalable way, I think it requires getting a typelist of all bases of a given type at compile time which is not standard as of yet), "templatize" `globals_t` somewhat (for static non-inline members of template classes) and handle name collisions (as we put static members of different classes in the same `globals_t` structure).
Note this is achievable using heavy template metaprogramming and introducing global template function `statics<T>()` to get the static members of some type.
Note that for backward compatibility, we can replace any global variable by a reference to the member inside `get_globals()` and the order is definitely defined, i.e. if we recompile A after the transformation and add all these references, the old B, C or D would still compile just fine, even if they are not updated (due to the legacy reference members).
I'm leaving the details of all of these out of here for now, as I wan't to discuss the merit of this proposal before delving into complicated library implementation.
Advantages
- It solves the static initialization order fiasco
- It has zero runtime overhead
- It can be automated
- Every access to global variables would look like `globals().var`- i.e. the reader (both human and compiler) could detect `pure`ness very simply.
- We get static constructors/destructors for free: put them in the body of the ctor of `globals_t`
Disadvantages
- It does not solve the static deinitialization order fiasco, but forces an order instead of an undefined one (which is better IMO). But this is a breaking change for a code base - if the "correct" deinitialization order is nesscesarily different from the initialization one, then it would stay broken. It can be supported with some ugliness (seperating the ctor/dtor of the globals from their space's allocation).
- It requires one place in the translation unit where all globals are known (the place where `globals_t` is defined).
- The template metaprogramming might take a toll on compile times (preserving access specifiers of static members through inheritance hieriechy is not very scalable)
- We have to wrap a lot of the code here in ugly macros to hide all the gory template metaprogramming details (and the naked, weak tricks, and the multiple inheritance etc.). Reflection and Metaclasses would remove all this ugliness, and provide a (complicated) library only solution.
Proposals
This is comprised of three parts
1. [Core] Add `weak` and `naked` context sensitive keywords (they have observable effect on the program, therefore cannot be attributes according to the guidelines).
The behaviour of those keyword in the standard would be defined by exceptions to the ODR rule.
2. [Library] Add to the standard library the needed ingredients for this construct.
3. [SG15] Some description of the code transformation of the tool (I'm not sure what can actually be proposed there).
The motivation to put the library in the standard as well is that in the (optimistic, non realistic) future where the access to globals everywhere is via `lib::globals()` this "lib" should very definitely be "std"
especially as this library would be a very complex template machine which might need compiler hooks for optimized implementation.
Note I have not written any details for the proposal as I want to discuss merit beforehand.