C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Interceptor Function (preserve stack and all registers)

From: Frederick Virchanza Gotham <cauldwell.thomas_at_[hidden]>
Date: Mon, 13 Apr 2026 13:50:46 +0100
On Sun, Apr 12, 2026 at 4:26 AM Thiago Macieira wrote:
>
> > I think that will work perfectly for every conceivable function call
> > under System V x86_64.
>
> Yes, every ABI-respecting one.


Yes that's what I'm going for here. So for the x86_64 CPU, I will make
it work properly for the interception of System V and also Microsoft
(i.e. msabi x64) function calls.

So rather than save the entire CPU state, I'll just preserve the
function call state, which pretty much means just keep the stack
intact and also keep the function argument registers intact.

The Microsoft function call argument registers are a subset of the
System V registers, so instead of writing two separate algorithms (one
for Microsoft, one for System V), I'll just write one that works for
both.

I don't want to code a solution for x86_64 and then have to start from
scratch for, say, aarch64, and so that's why I'm imagining also doing
this for 32-Bit x86 which passes all arguments on the stack.

At first I was thinking it would work like this:

[[mustttail]] void Func(void)
{
  __builtin_push_args();

    . . .
    . . .
   auto address = GetProcAddress( . . . );
    . . .
    . . .

  return __builtin_pop_args_and_jump( address );
}

On x86_64, when the return statement is reached, 'address' will be
placed in RDI. This isn't a big deal though because we're going to
restore all the registers anyway.

However on x86_32, that return statement will need to achieve two things:
  (1) Don't move the stack pointer
  (2) Alter the first argument

The first argument will be on the stack (not at the top position, but
at second-from-top position because the return address will be at the
top) -- but only if there actually is a first argument. There might
not be a first argument if the function you're intercepting takes no
arguments. We don't want to overwrite values on the stack at all.

So next I was thinking it might have to be something like:

[[mustttail]] void Func(void)
{
  __builtin_push_args();

    . . .
    . . .
   auto address = GetProcAddress( . . . );
    . . .
    . . .

  __builtin_set_tailcall_address(address); // On x86_32, store
'address' in EAX
  return __builtin_pop_args_and_jump(); // jump to EAX
}

At first glance this looks okay . . . but what if a destructor gets
called? For instance:

[[mustttail]] void Func(void)
{
  __builtin_push_args();

  string str;
  str += "dsfghslakdfhsklahfskahfklashfkljash";
  str += "afhsklahfsahflksah;sahfkjashfklshaf";
  str += "afhsklahfsahflksah;sahfkjashfklshaf";
  str += "afhsklahfsahflksah;sahfkjashfklshaf";
    . . .
    . . .
   auto address = GetProcAddress( . . . );
    . . .
    . . .

  __builtin_set_tailcall_address(address); // On x86_32, store
'address' in EAX
  return __builtin_pop_args_and_jump(); // jump to EAX
}

The problem here is that after we call
"__builtin_set_tailcall_address", the destructor of "str" will be
called to deallocate memory (definitely overwriting EAX), and so then
the next line "__builtin_pop_args_and_jump" will jump to a garbage
address.

I'm thinking that there's two possible solutions to this. First
solution is to wrap all the interceptor code inside a lambda, like
this:

 [[mustttail]] void Func(void)
{
  __builtin_push_args();

  auto const address = []() constexpr -> void(*)(void)
  {
    string str;
    str += "dsfghslakdfhsklahfskahfklashfkljash";
    str += "afhsklahfsahflksah;sahfkjashfklshaf";
    str += "afhsklahfsahflksah;sahfkjashfklshaf";
    str += "afhsklahfsahflksah;sahfkjashfklshaf";
    return GetProcAddress( . . . );
  }(); // <----------------- note the two parentheses here, we're invoking it!

  __builtin_set_tailcall_address(address); // On x86_32, store
'address' in EAX
  return __builtin_pop_args_and_jump(); // jump to EAX
}

I think this will ensure that EAX won't be overwritten by a
destructor. So that's one possible solution.

I think the second possible solution is to program the compiler to
give special treatment to the invocation of
"__builtin_pop_args_and_jump" . . . The special treatment would be
that instead of pushing the address onto the stack, put it in a
register. The handiest way to pull this off might be to specify the
calling convention as "__thiscall", because then the first argument
goes in ECX instead of on the stack.

As I said though, I don't want to code this 6 ways for 6 different
architectures, and so even if we can get "__thiscall" to work for
x86_32, maybe other architectures won't make it so easy to place the
first argument into a register rather than on the stack. For this
reason, maybe the best solution is the first solution I posted above,
the one which wraps everything inside a lambda so that a destructor
won't be able to overwrite the scratch register.

Received on 2026-04-13 12:51:01