ISOCPP std-proposals List: Re: [std-proposals] Standard support for different ABI's for class vtables

From: Frederick Virchanza Gotham <cauldwell.thomas_at_[hidden]>
Date: Mon, 2 Jun 2025 15:43:15 +0100

Okay there's been talk in this thread about how the Standard doesn't
mandate the use of vtables nor even acknowledge the existence of
vtables. That's fair enough, we can keep things abstract here. An
object of a polymorphic class type will have a link to some sort of
"polymorphic facilitator" -- which today in 2025 for all C++ compilers
is a vtable. But we can stay abstract and call it a polymorphic
facilitator.

The polymorphic facilitator can be inside the object -- e.g. in the
form of a pointer to the vtable, or alternatively there could be a
global container something like:

    std::map< void*, void* > g_pointers_to_polymorphic_facilitators;
// make sure to protect with mutex

Any system of mapping an object to its polymorphic facilitator is fine
so long as the compiler, when it's given a pointer to an object of
type T, knows how to find the polymorphic facilitator. Although the
global 'std::map' method might not work with trivial destructors
because the destructor would have to update the map (thereby making
itself nontrivial) -- and so maybe the polymorphic facilitator (or the
link to the polymorphic facilitator) has to be inside the polymorphic
object.

99% of C++ compilers make this system very simple -- any polymorphic
object will always have a pointer to its polymorphic facilitator
located at address [base + 0x00] inside the object. What this means,
is that when we have a "void*", we _can_ actually do the following two
things even though the Standard doesn't let us to:

    (1) Get the most-derived object (i.e. dynamic_cast to void*)
    (2) Get the type_info

Here are these two features coded up on GodBolt:

      https://godbolt.org/z/ajcnb8qda

The above GodBolt works on every C++ compiler ever made. Except for
one. Microsoft.

On Microsoft it will work properly the vast majority of the time, but
sometimes it will crash because of the following fact:
    "Where as 99% of compilers have a uniform way of mapping
     an object to its polymorphic facilitator, the Microsoft
     compiler does not have a uniform way -- it can differ by type."

To bring that a bit more down to Earth:

    "The Microsoft compiler doesn't always place the
     vtable pointer at the very beginning of the object."

and here's an example in the following GodBolt:

      https://godbolt.org/z/44j7or1rG

The above GodBolt shows that the Microsoft compiler places the vtable
pointer _after_ the non-polymorphic base, specifically at [base +
0x08].

So it's because of the Microsoft compiler -- and _only_ because of the
Microsoft compiler -- that we can't do the two things I talk about
above. But last night I figured out a possible solution to this
conundrum.

On the Microsoft compiler, every type has an RTTICompleteObjectLocator
(sort of like Microsoft's very own personal form of 'std::type_info'):

    struct RTTICompleteObjectLocator {
        uint32_t signature; // Always 0 for MSVC
        uint32_t offset; // Offset of the vtable within the
complete object
        uint32_t cdOffset; // Constructor displacement offset
        struct TypeDescriptor* pTypeDescriptor; // Pointer to
type_info structure
        struct _RTTIClassHierarchyDescriptor* pClassDescriptor; //
Inheritance hierarchy
    };

Do you see that second member? It's the number that we need in order
to find the vtable inside a polymorphic object. Here comes the fun
part, the reverse-engineering -- which is necessary because of the
burden Microsoft has placed on the C++ standardisation process.

This would be really easy if Microsoft provided an operator or
function that gave us the RTTICompleteObjectLocator for any given
polymorphic type. But they haven't made it that easy (or maybe it _is_
that easy and they just haven't publicly documented the feature).
There isn't even any link going from the 'std::type_info' to the
RTTICompleteObjectLocator either. What I have been able to ascertain
though, is how to determine the linker symbol for the
RTTICompleteObjectLocator object, as follows:

    Name of class: MyClass
    Mangled name of its RTTICompleteObjectLocator: ??_R4MyClass@@6B@
          ??_R4 — Prefix indicating an RTTI Complete Object Locator.
          MyClass@@ — The class name, with MSVC-style name mangling.
          6B@ — Suffix indicating the type of RTTI structure.

Then inside a VC++ source file we can access the
RTTICompleteObjectLocator for 'MyClass' as follows:

    extern "C" const void const *const _rtti_locator_MyClass;
    #pragma comment(linker, "/include:??_R4MyClass@@6B@")

Now this is all well and good until we come to templates. How do we
get the RTTICompleteObjectLocator of the following type:

      std::vector< int, MyOwnPersonalAllocatorType >

That one would be tricky.

So . . . instead of trying to work with the linker symbol for the
RTTICompleteObjectLocator, I had an idea come to me last night.
Consider the following function:

    template<class T>
    std::type_info const &GetTypeInfo(T &&obj)
    {
        return typeid(obj);
    }

The above template function is able to get the 'std::type_info' for
any polymorphic type. Therefore, the machine code produced for the
above template function must contain within it the offset to the
vtable. Let's write a GodBolt to see what assembler we get:

    https://godbolt.org/z/Eq7EbbaPK

Here's the assembler we get for GetTypeInfo<Derived1&>:

    mov QWORD PTR [rsp+8], rcx ; Store rcx ('this' pointer) on stack
    sub rsp, 40 ; Allocate stack space
    mov rcx, QWORD PTR obj$[rsp] ; Load object pointer into rcx
(prepare for typeid call)
    call __RTtypeid ; Call MSVC's runtime type
identification function
    add rsp, 40 ; Restore stack space after function call
    ret

Nothing too crazing going on in the above assembler because the vtable
pointer is at [base + 0x00].

But now let's look at the assembler for GetTypeInfo<Derived2&>. The
assembler you'll see on GodBolt checks if the address is null, but
since we're dealing with a reference instead of a pointer, I've
removed the null check, leaving us with the following reduced
assembler:

        mov QWORD PTR [rsp+8], rcx ; Store rcx ('this' pointer) on stack
        sub rsp, 56 ; Allocate stack space
        mov rax, QWORD PTR obj$[rsp] ; Load object pointer into rax
        mov rax, QWORD PTR [rax+8] ; Get vtable pointer from the object
        movsxd rax, DWORD PTR [rax+4] ; Sign-extend an offset value
from vtable (possibly RTTI)
        mov rcx, QWORD PTR obj$[rsp] ; Reload object pointer into rcx
        lea rax, QWORD PTR [rcx+rax+8] ; Compute final RTTI
pointer using offset
        mov QWORD PTR tv78[rsp], rax ; Store computed RTTI pointer in tv78
        mov rcx, QWORD PTR tv78[rsp] ; Load the computed RTTI
pointer into rcx
        call __RTtypeid ; Call runtime type
identification function
        add rsp, 56 ; Restore stack space after
function call
        ret

Do you see that fourth instruction? This one:

    mov rax, QWORD PTR [rax+8] ; Get vtable pointer from the object

This is what we need. That number 8 is the offset. So at runtime we
can analyse the machine code of GetTypeInfo<Derived2&>, and look for
the first instruction where it adds a numeric constant to the register
RAX. But first let's go to the website,
"https://defuse.ca/online-x86-assembler.htm", and type that
instruction in to get the machine code, it gives us back:

   constexpr char unsigned instruction[] = { 0x48, 0x8B, 0x40, 0x08 };
   // the last byte is the offset 8

So now let's write a function to pluck out the offset from the
function's machine code:

    unsigned PluckOutOffset(void const *const pv)
    {
        constexpr char unsigned instruction[] = { 0x48, 0x8B, 0x40, /*0x08*/ };
        char unsigned const *const p = static_cast<char unsigned const*>(pv);

        for ( unsigned n = 0u; ; ++n )
        {
            if ( p[n + 0] == instruction[0] &&
                 p[n + 1] == instruction[1] &&
                 p[n + 2] == instruction[2])
            {
                return p[n + 3];
            }
        }
    }

And now we can write a function, which given any object of polymorphic
type, can give us back the offset of the vtable inside the object, as
follows:

    template<class T>
    requires std::is_polymorphic_v< std::remove_cvref_t<T> >
    unsigned GetVTableOffset(T &&obj)
    {
        char unsigned const *const p = (char unsigned
const*)&GetTypeInfo< std::remove_cvref_t<T>& >;

        // Check if it's a very short function with
        // a return instruction at position 26
        if ( (0xc3==p[26]) && (0xcc==p[27]) ) return 0u;

        // Okay so we have a long function, let's pluck out the offset:
        return PluckOutOffset(p);
    }

We are 50% of the way there. We are able to get the vtable pointer
from the object pointer. But now we need to go the other way: we need
to get the object pointer from the vtable pointer, as follows:

     void *VTable_Pointer_to_Object_Pointer(void const *const pvtable)
     {
          // The RTTICompleteObjectLocator is located sizeof(void*)
bytes before the vtable:
          unsigned const *const locator = *(*(unsigned***)pvtable - 1);
          // The member 'offset' is the second int inside the
RTTICompleteObjectLocator:
          unsigned const *const pn = locator + 1;
          return (char*)pvtable - *pn;
     }

So now on the Microsoft compiler, we have a way of:
    (1) Getting the vtable pointer from any pointer-to-object
    (2) Getting the object pointer from any pointer-to-vtable

So now let's write a new class, "std::polymorph_handle", which under
the hood is just a "void*", and which stores the address of the
pointer-to-vtable inside an object:

    struct polymorph_handle {
        void *p;

        template<class T>
        requires std::is_polymorphic_v< std::remove_cvref_t<T> >
        polymorph_handle(T &&arg)
        {
            p = arg + GetVTableOffset(arg);
        }

        void *get_pointer_to_object(void) const noexcept
        {
            return VTable_Pointer_to_Object_Pointer(p);
        }
    };

So now let's create a global container of polymorph_handle's:

    std::vector<polymorph_handle> mypolymorphs;

And let's populate it with all different kinds of polymorphic objects,
some with their vtable at [base + 0x00], and some with their vtable at
another location such as [base + 0x08]. Here we go:

      https://godbolt.org/z/jWez7dzjc

The above GodBolt works but it's by no means perfect. I wouldn't have
to analyse the machine code if Microsoft provided a built-in operator
such as:

    __get_rtti_complete_object_locator(T)

(Maybe they actually have such an operator but it's not publicly documented)

Now that we have a working implementation of std::polymorph_handle, we
can now do the following two things:

    (1) Get the most derived object from an std::polymorph_handle
    (2) Get the type_info from an std::polymorph_handle

We could implement these features as member functions as follows:

    struct polymorph_handle {

        void *p;

          . . .
          . . .
          . . .

        std::type_info const &GetTypeInfo(void) const noexcept
        {
            struct Dummy { virtual ~Dummy(void) noexcept = default; };

            Dummy *const pdummy = static_cast<Dummy*>(this->p);

            return typeid(*pdummy);
        }

        void *GetMostDerived(void) const noexcept
        {
            struct Dummy { virtual ~Dummy(void) noexcept = default; };

            Dummy *const pdummy = static_cast<Dummy*>(this->p);

            return dynamic_cast<void*>(pdummy);
        }
    };

And here's a GodBolt testing it out:

    https://godbolt.org/z/6KMzEqaT9

You'll be able to break it though by testing out different classes
because my code to analyse the machine code isn't perfect. I tried to
use it with a 'stringstream' and its sub-objects 'istream' and
'ostream' but it crashed -- presumably because I got the wrong offset
for the vtable pointer. But it works in principle, and Microsoft can
make this really easy by providing
__get_rtti_complete_object_locator(T).

So . . . whereas previously you dealt with a container of " void *
" representing polymorphic objects, you can now use "polymorph_handle"
instead, which has zero overhead on 99% of compilers, and only a tiny
bit of overhead on the Microsoft compiler -- unless of course it can
be made consteval.

But here's the icing on the cake: It's not even an ABI break for 99%
of compilers. On 99%, the " void * " for the whole object is that same
as the pointer held by the polymorph_handle. And when it comes to
Micrsosoft, well we're making something possible that was previously
impossible, so you can't have an ABI break on a new feature!

I might write a paper proposing the addition of std::polymorph_handle
to the language.

P.S. I'll admit that all that assembler messing took me about 6 hours
to get right

Received on 2025-06-02 14:43:28