Document number: TODO Date: yyyy-mm-dd Audience: Core Language Evolution Working Group Reply-to: Samuel Ammonius ---------------- C++ Common ABI Specification Draft 1 Table of Contents: - Introduction - Motivation and Scope - Impact on the Standard - Technical Specifications - Design Decisions INTRODUCTION: Many great libraries are built on top of years of C++ code. Although the code is cross-platform, it is difficult to mix different compilers or languages with C++. This is because the C++ standard currently does not dictate much of the low-level behavior that code will produce, and instead leaves it to the compiler. This proposal fixes this by giving a common way of naming symbols and managing classes and methods. IMPACT ON THE STANDARD: This proposal has no impact on existing C++ features and is a pure extension. MOTIVATION AND SCOPE: The key motivation for this proposal is that while extern "C" does make it possible for different programming languages and compilers to use a C++ library, it will be extremely difficult for existing libraries to implement this. When a library is brand new, things like version controll and cross-compiler/language support are usually not a priority. The library is almost always linked as static for one programming language. If that library grows enough that developers in other languages would like to use it, it is usually no longer being linked statically and therefore binary compatability has become an issue. Usually this isn't a problem, since most compiled languages have a concrete system of naming function symbols. This currently isn't the case in C++ though, so requests for language bindings are often declined. If there is a lot of demand and contributors, sometimes a script is made to automatically generate wrappers for a certain language. This proposal is made for developers who want to use a C++ library from another compiler or language, or library developers who would like to make use of advanced C++ features (namespaces, classes, overloading) while still supporting other languages and compilers. This gives developers more freedom in how they use and share C++ programs. TECHNICAL SPECIFICATIONS: The namespace(s) and class of a function or variable will be prepended to its name to form its symbol. namespace geometry { class circle { static int radius; }; } In this case, "radius" will have the symbol "geometry_circle_radius". Variables defined in global scope will have the same symbol as they would with extern "C". This also applies for statement labels (used by "goto"). In addition to this, functions will have their return types before their names unless they return "void", and parameters after their names unless there are none. int square(int i){ return i * i; } Here, "square" would have the symbol "int_square_int". For these types, "const" becomes "CONST", "*" becomes "PTR", and "&" becomes "REF". The "CONST" is before the parameter type while the "PTR" and "REF" come after. int square(const int *i){ return (*i) * (*i); } In this case, "square" would have the symbol "int_square_CONSTintPTR". Struct and union names are also included in the symbol, but typedefs will be expanded to the type they represent. Enums will be stored as some integer type determined by the compiler. For constructors and destructors the function names would be "new_Object" and "delete_Object", where "Object" is the class name. In cases where multiple functions have the same symbol, compilers may use their own methods of differentiating between them. such cases may look like this: namespace bla { void foo(int i){} void foo_int(){} } void bla_foo(int i){} void bla_foo_int(){} struct CONSTintPTR {bool e;}; void bar(const int *i){} void bar(CONSTintPTR b){} Of course, programs should avoid naming things like this even if different compilers and programming languages aren't needed. Moving on from symbols, classes will be stored in memory exactly like structs, with derived classes storing the base classes as members. class shape { public: uint32_t color; }; class square : public shape { int width; int height; }; // ...is identical to... struct shape { uint32_t color; }; struct square { struct shape s; int width; int height; } The order of the classes in memory will be equal to the order they were declared in: class foo : public a, b, c { int bla; } // ...is identical to... struct foo { struct a, b, c; int bla; } Member functions take the "this" parameter as the first argument. This includes destructors and constructors, but not static functions. class Ball { Ball(); bounce(int height); }; Here, the constructor would be stored as "new_Ball(Ball*)", and "Ball::bounce" would be "Ball_bounce(Ball*, int height)". With all of these details in effect, the macro "__cpp_common_abi" would be defined. DESIGN DECISIONS: All function names follow the typical variable and function naming rules that C++ and most programing languages use today: only letters, digits, and underscores. This is so that other programming languages can access C++ functions and variables. // C++ library function int square(int i){ return i * i; } ` /* C caller */ int int_square_int(int i); #define square(I) int_square_int(I) On top of that, function names are also made to be human-readable without too much effort. The square function from above could have been given the following symbol instead: "rint__fsquaref__int" While this does decrease the risk of 2 functions having the same symbol, it would be a lot more work to bind to another language. One design decision was to avoid fixing impractical problems with impractical solutions. This is why renaming identical symbols is left up to the compiler.