C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Standard support for different ABI's for class vtables

From: Frederick Virchanza Gotham <cauldwell.thomas_at_[hidden]>
Date: Tue, 3 Jun 2025 00:23:54 +0100
On Mon, Jun 2, 2025 at 6:29 PM Oliver Hunt wrote:
>
> It could be 6 nines, but if your position is “I don’t like the ABI used for
> C++ polymorphism on windows and therefore it should be ignored” is
> DoA, as you’re arguing with reality.


What I'm saying is that Microsoft's implementation of polymorphism is
the worst of all the C++ compilers, and that the C++ standardisation
process is burdened by this. If you were to take Microsoft out of the
picture, more would be possible when you have a pointer to a
type-erased polymorphic object. (I've already pointed out two things
that would become possible).


> > -- any polymorphic
> > object will always have a pointer to its polymorphic facilitator
> > located at address [base + 0x00] inside the object. What this means,
> > is that when we have a "void*", we _can_ actually do the following two
> > things even though the Standard doesn't let us to:
>
> You are correct, the standard does not let you do this. That you want to is irrelevant.


But the Standard can very easily allow this.


> > The above GodBolt works on every C++ compiler ever made. Except for
> > one. Microsoft.
> >
>
> Incorrect, this fails on Darwin - the process is terminated the moment you try
> to use the vtable of a non-Dummy typed object as a Dummy. This is not a recoverable error.


What's Darwin? Are you talking about Apple macOS? I don't have an
Apple computer but I have macOS 15 Sequoia running in a virtual
machine here on my x86_64 laptop. I took that C++ code, put it in a
source file, and I did the following at the command line:

    healytpk_at_Thomas-MacBook-Pro Desktop % clang++ -o prog main.cpp
    healytpk_at_Thomas-MacBook-Pro Desktop % ./prog
    Address of stringstream: 0x7ff7b92b42e8
    Address of ostream: 0x7ff7b92b42f8
    Address of most derived: 0x7ff7b92b42e8
               Typeinfo:
NSt3__118basic_stringstreamIcNS_11char_traitsIcEENS_9allocatorIcEEEE

Doesn't crash. Runs fine.


> > On Microsoft it will work properly the vast majority of the time, but
> > sometimes it will crash because of the following fact:
> > "Where as 99% of compilers have a uniform way of mapping
> > an object to its polymorphic facilitator, the Microsoft
> > compiler does not have a uniform way -- it can differ by type."
> >
> > To bring that a bit more down to Earth:
> >
> > "The Microsoft compiler doesn't always place the
> > vtable pointer at the very beginning of the object."
>
> No, you are assuming the layout of the polymorphic facilitator is such
> that unrelated types can be used interchangeably, which is UB.


Computers aren't magical. I know some people like to get all
metaphysical and airy fair about UB, talking about "demons out your
nose" and all that, but really your computer isn't going to grow legs
and walk away. Have a think about what the compiler will do, and take
a look at the machine code / assembler it produces. And if you want to
remove UB from the picture entirely, then just write it in assembler
instead of C++. The following assembler function can get the
'type_info' of any type-erased polymorphic object on an x86_64 machine
using the System V ABI:

        GetTypeInfo:
            mov rax, qword ptr [rdi]
            mov rax, qword ptr [rax - 8]
            ret

There's no surprises in the above 3 instructions. No undefined behaviour.


> Your code is incorrect the moment you static_cast AnythingAtAll* to Dummy*,
> I’m surprised none of the sanitizers are tripped by this.


And even if it did trip something, I'd figure it out with a
combination of 'volatile' and 'std::launder'. Or I'd just write it in
assembler.

By the way 'volatile' is the best thing in the world if you ever need
to take advantage of UB.


> Alas godbolt does not run on hardware that supports pointer auth (or the related , but we can set the appropriate flags:
>
> https://godbolt.org/z/4Gb14srK5
>
> Then in the body of GetTypeInfo use can see the `autda` that fails and triggers process termination


That GodBolt you gave me, it does indeed crash . . . but it also
crashes on Hello World, check it out:

    https://godbolt.org/z/fd4K7T9KT

If you can find me a compiler that doesn't crash on Hello World, but
does crash on my code, then I'll take a look. And I bet ya I can
figure it out and get it working.


> > The above GodBolt shows that the Microsoft compiler places the vtable
> > pointer _after_ the non-polymorphic base, specifically at [base +
> > 0x08].
>
> So?
>
> Your code has broken because you decided to try and treat the layout of two
> unrelated objects as interchangeable. That’s simply incorrect.


It's not a big deal if you have an understanding of what the compiler
will do. I know that the compiler will only look at the vtableptr in
that object. It's harmless. I've checked the machine code / assembler
to be sure. But again, we can write it in less than ten CPU
instructions in assembler if you really want to eradicate all UB.


> > So it's because of the Microsoft compiler -- and _only_ because of the
> > Microsoft compiler -- that we can't do the two things I talk about
> > above. But last night I figured out a possible solution to this
> > conundrum.
>
> No, it’s because that is not how the language works, and you are writing code
> that is wrong. Also again, this fails on darwin, and it will fail on linux once linux
> adopts pointer authentication. I suspect it would also fail on a hypothetical CHERI
> system for similar reasons.


My code doesn't cause address sanitizer to throw up an alarm. If you
can beat address sanitizer then you can beat anything. Here's how I
build my programs on Linux to debug them:

    set(ENV{ASAN_OPTIONS} "detect_invalid_pointer_pairs=2")
    set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -ggdb3
-pedantic -Wall -Wextra -rdynamic -funwind-tables
-fno-omit-frame-pointer -fno-common -pthread
-fsanitize=address,leak,undefined,pointer-compare,pointer-subtract,float-divide-by-zero,float-cast-overflow
-fsanitize-address-use-after-scope -fcf-protection=full
-fstack-protector-all -fstack-clash-protection -fvtv-debug
-fvtv-counts -finstrument-functions -D_GLIBC_DEBUG
-D_GLIBC_DEBUG_PEDANTIC -D_GLIBCXX_DEBUG -D_GLIBCXX_DEBUG_PEDANTIC")

Not many a bug survives all those options. However my code for
GetMostDerivedObject and GetTypeInfo compiles and runs just fine.


> > Here comes the fun part, the reverse-engineering -- which is necessary because of the
> > burden Microsoft has placed on the C++ standardisation process.
>
> No, MS has not placed a burden here, as you are demanding the standard
> specify exact behavior of a language implementation detail that is not
> described anywhere in the standard.


Just because it's not described anywhere in the standard, doesn't mean
that engineers can come up with really backwards solutions that burden
the rest of the planet. Not putting the vtable pointer at the
beginning of the object . . . I mean really . . . it's up there with
naming the 64-Bit version of their kernel library, "kernel32.dll", for
backward-compatibility reasons. Or referring to x86_32 as "x86", and
referring to x86_64 as "x64" -- the latter is clear but the former has
you wondering emmmm do they mean 32-Bit x86 or 64-Bit x86 . . . and
then you open up Visual Studio and they call it "Win32" instead of
"x86", even though "_WIN32" is also defined when you build in 64-Bit
mode. And now they have 64-Bit ARM so you have to remember that x64 is
really x86_64 and not arm64/aarch64.


> And again, your constant “this works everywhere else” is wrong.
>
> There is a reason why what you are doing is UB, and is precisely
> because the false assumptions in your code are not sound.


It's a proof of concept, that's all. I'm showing that this can already
be done without an ABI break and without compiler vendors doing much
work. It just has to be written into the Standard. It can be
implemented in assembler if you really want to excise the UB.


> What you are currently asking for is:
>
> 1. I want to be able to say “use this ABI instead of that ABI for this class”
> 2. Then I want the standard to make it not defined behavior to cast and use
> a polymorphic object of one type as an object of an unrelated type. That means
> type aliasing optimizations cease to become valid,


Yeah 1 and 2 aren't really related . . . I started talking about 1 and
then went off on a tangent talking about 2.


> (1) is (in principle) easy: that could be a vendor attribute, just as
> things like “fastcall”, regparam, etc are today
>
> (2) just breaks: the fact the an object of one type cannot be used
> as an object of an unrelated type is a fairly fundamental feature of C and C++.


If you know where the vtable pointer is inside an object, you can have
a field day with it . . . you can find its type_info . . . you can
find its most-derived object . . . . all of this stuff is already
possible, it just needs to be written into the Standard. I thought it
wasn't possible with Microsoft but it actually is if you use a
'polymorph_handle' instead of a ' void * '.

If you're really insistent that my proof of concept shouldn't have UB
then I'm happy to write it in assembler for every architecture . . . I
mean I wrote assembler for 64-Bit ARM, HPPA, Motorola 68K and also
SuperH for this paper alone:
http://www.virjacode.com/papers/paper_nrvo_latest.pdf


> What you should be asking for is not “I want an ability to override the
> polymorphism implementation, because then I can use definitionally
> erroneous code that happens to work on my own platform, to achieve
> what I want”. You have fixated on your specific implementation, and
> have correctly identified that your reliance on UB breaks it other platforms.


Not sure what you're talking about here. I haven't conceded that my
code malfunctions anywhere (unless you're talking about the
quick-and-dirty code I wrote to analyse the machine code at runtime on
the Microsoft compiler?).


> Rather than asking for language changes to support your incorrect code,
> you should be asking for language features to support what you are actually
> trying to do. Not all such language features are possible.
>
> So what is it you are actually trying to do?


Here's what I want, I'll spell it out. If you start off with a
polymorphic object:

    std::stringstream ss;

And then if you type-erase it:

    void *pv = &ss;

I want to be able to take that type-erased pointer and do the following:
    (1) Get the address of the most-derived object
    (2) Get the type_info

These two things are very easily possible on every single C++ compiler
in the world -- except for Microsoft. On the Microsoft compiler you
need to get a little creative, which is what I've done with
"std::polymorph_handle". Using a 'polymorph_handle' on every compiler
other than Microsoft will be no different that dealing with a simple "
void * ". So if you've already written code that has a vector<void*>,
then it won't be an ABI break if you change it to
vector<polymorph_handle>. And as for the Microsoft compiler, well
something that was previously impossible has become possible so
there's no worries about breaking ABI.

I realise that Microsoft software is on 1.45 billion computers
worldwide. But that figure doesn't somehow, some way, exonerate them
of bad engineering. If Microsoft are either unwilling or unable to
change their ABI, then an alternative solution is my proposed
'polymorph_handle', which I'm thinking just now I'll shorten to
'std::polyhandle'.

So if we deal with 'std::polyhandle' instead of ' void * ', we're
working with a type-erased pointer to a polymorphic object and the
following two things become possible:
    (1) Get the address of the most-derived object
    (2) Get the type_info

That's what I want. And I bet that we would have had this
functionality in the Standard many many years ago if it hadn't been
for Microsoft with their lack-lustre burdensome ABI.

Received on 2025-06-02 23:24:02