ISOCPP std-proposals List: Re: [std-proposals] Interceptor Function (preserve stack and all registers)

From: Frederick Virchanza Gotham <cauldwell.thomas_at_[hidden]>
Date: Mon, 29 Jul 2024 09:36:07 +0100

On Mon, Jul 29, 2024 at 12:23 AM Lorand Szollosi wrote:
>
> But note, this needs moves (or any other kind of forwarding, which
> might or might not be detectable in the code of the DLL being loaded).
> If you don't want locals of the interceptor to interfere, then you can't
> have locals (that cannot be overwritten), and that's a serious limitation.

You can change the stack all you want inside an interceptor function,
so long as you restore it before the jump, and once again restore it
before jumping back to the caller. We can have "local variables" that
persist past the jump -- but these local variables cannot be on the
stack.

> In fact, if you were to use std::lock_guard<std::mutex> there, it
> would probably be overwritten on the stack. This is because, on
> most architectures, parameters and locals (ASDVs) use the same
> stack. Now, you might say this should work:
>
> std::mutex m;
>
> auto OpenSocket(auto&&... args) /* interceptor */
> {
> WriteLog( "Function called 'OpenSocket'");
> std::lock_guard<std::mutex> guard(m);
> auto const h = LoadLibraryA("transport.dll.original");
> auto const pf = (void(*)(void))GetProcAddress(h, "OpenSocket");
> return pf(std::forward<decltype(args)>(args)...);
> }

It's reasonable enough that something like the above function would be
valid C++26. Of course though, the variable called 'guard' cannot be
on the stack -- but instead it could be thread-local. It would be an
unusual kind of thread-local variable in that it's constructed and
destructed every time the function is called. The compiler would sort
of turn your above code into something like:

    std::mutex m;

    thread_local alignas(std::lock_guard<std::mutex>) std::byte
storage[ sizeof(std::lock_guard<std::mutex>) ];

    auto OpenSocket(auto&&... args) /* interceptor */
    {
        WriteLog( "Function called 'OpenSocket'");
        ::new(&storage) std::lock_guard<std::mutex>(m);
        auto const h = LoadLibraryA("transport.dll.original");
        auto const pf = (void(*)(void))GetProcAddress(h, "OpenSocket");
        goto pf( std::forward<decltype(args)>(args)... );
        ((std::lock_guard<std::mutex>*)&storage)->~std::lock_guard<std::mutex>();
        goto return;
    }

However I wouldn't use the syntax "(auto&&... )" combined with
"forward<decltype(args)>(args)...", because it implies that all the
parameters are memory addresses (i.e. everything is passed by
reference). I would prefer:

     std::mutex m;

     auto OpenSocket(auto) interceptor
     {
         WriteLog( "Function called 'OpenSocket'" );
         std::lock_guard<std::mutex> guard(m);
         auto const h = LoadLibraryA("transport.dll.original");
         auto const pf = GetProcAddress(h, "OpenSocket");
         goto(pf);
         return;
     }

> Also, it's not apparent whether the goto pf in this mail is supposed to
> mean the same thing as in the original mail or not. Originally, you
> expected it to replace the current function; now, you expect it like a
> call, where we can return to.

I'll clarify that now. The keyword "goto" is already in C++23, and
it's used as follows to jump to a label inside the same function:

    unsigned Func(unsigned &i)
    {
    Start:
        ++i;
        if ( 0u != (i % 7u) ) goto Start;
        return i;
    }

With the introduction of "interceptor functions" in C++26, we can now
use "goto" followed by an expression in parentheses. The expression in
parentheses must be some kind of pointer (because we need the address
of the entry point of the function or member function). Like this:

     auto OpenSocket(auto) interceptor
     {
         goto( some_global_function_pointer_variable );
         return;
     }

Whenever you use "goto( expression )", it is not just a
straight-forward jump -- instead it changes the return address at the
top of the stack, and so the jumped-to function returns back to the
interceptor function. The interceptor function isn't returned from
until the "return" statement is encountered.

Of course the compiler can perform an optimisation. If there is a
"goto( expression )" statement followed immediately by a "return"
statement, then there's no need to change the return address at the
top of the stack -- i.e. it can just be a straight-forward jump.

Received on 2024-07-29 08:36:16