C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Interceptor Function (preserve stack and all registers)

From: Lorand Szollosi <szollosi.lorand_at_[hidden]>
Date: Fri, 26 Jul 2024 19:37:04 +0300
Hi,

This is, as I read it, essentially the same thing as function jump in c--, which allows some optimizations and features in Haskell. It’s very much needed in continuation passing / continuation pool style, so slashing it with “how many times do you need it” is an invalid argument: I’d use it all the time. It’d also help in some cases for which we have multiple proposal ideas rotating here already, which is basically multi-return (including return twice, return any number of types - not listed in advance, e.g., types created by third-party template function, etc.). I actually encountered use-cases where we had to wrap a lambda to a std::function<…>, thus have a virtual call, instead of passing a lambda to the function we’d jump to. This slows down inner loops in message processing (HFT) codes.

So yes, even if many people here don’t want full call/cc in C++, the jump-to-function could be a useful step in the direction of handling continuations.

Thanks,
-lorro

> On 26 Jul 2024, at 13:22, Frederick Virchanza Gotham via Std-Proposals <std-proposals_at_[hidden]> wrote:
>
> Let's say we have gotten a proprietary library file from somewhere,
> it's built in Release mode, stripped of all its symbols, and the only
> info we have on it is the accompanying header file. Let's say one of
> the functions in the header file is:
>
> extern int FlushPipe(void *pipe, int flags);
>
> If this library file is used as part of a very big project, and if
> we're trying to figure out what's causing the program to crash, we can
> write a "man in the middle" library that intercepts these function
> calls, logs them, and then calls the original function, something
> like:
>
> typedef int (*FuncPtr_FlushPipe)(void *pipe, int flags);
>
> template<typename... Params>
> decltype(auto) FlushPipe(Params&&... args)
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> auto const pf = (FuncPtr_FlushPipe)GetProcAddress(h, "FlushPipe");
> return pf( static_cast<Params&&>(args)... );
> }
>
> The above code snippet will work fine if we can put it into a header
> file that will be compiled into the program. But sometimes we can't
> recompile a proprietary library, so we need to actually create a "man
> in the middle" library that exports all the same functions as the
> original library. So then our "man in the middle" library would export
> the following function:
>
> typedef int (*FuncPtr_FlushPipe)(void *pipe, int flags);
>
> extern "C" int FlushPipe(void *const pipe, int const flags)
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> auto const pf = (FuncPtr_FlushPipe)GetProcAddress(h, "FlushPipe");
> return pf( pipe, flags );
> }
>
> Two weeks ago in my work, I was assigned a bug that was more
> complicated than this, and what I really needed in C++ was an
> "interceptor function", which would look something like this:
>
> auto Func(auto) interceptor
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> auto const pf = (void(*)(void))GetProcAddress(h, "FlushPipe");
> goto pf;
> }
>
> The above function marked as an "interceptor" always ends with a jump
> to another function. Up until the jump takes place, it can alter the
> registers and the stack however it pleases, however it must restore
> all registers and the stack before jumping to the other function.
>
> The bug I was investigating at work -- which took me 10 working days
> to figure out -- involved one massive program (as massive as your
> favourite word processor) running on MS-Windows x86_64, and the crash
> was caused by the interaction with 5 DLL files that also interact with
> each other. I tried my best using debugger tools, but since all the
> DLL's had their symbols stripped, it was slow progress. I disassembled
> portions of the proprietary DLL's but still didn't make much progress
> reading through the instructions. I replaced the main DLL with a "man
> in the middle" as described above. I made a little progress doing
> this, but still needed more info. So next I replaced all 5 DLL files
> with a "man in the middle" -- which was a little tricky because some
> of the DLL's were proprietary and I didn't have the header file.
>
> Let's say the proprietary DLL is called "monkey.dll", and so we use
> Dependency Walker or 'dumpbin' to get a list of all exported functions
> as follows:
>
> OpenPipe
> FlushPipe
> ReadPipe
> WritePipe
>
> These 4 names are not mangled, and as we don't have the accompanying
> header file, we don't know what these functions' parameters are, and
> we don't know what the return type is either. Even if they were
> mangled, the mangled name might not contain the return type (e.g. the
> GNU compiler mangling omits the return type). If the return type is
> small, it will simply go in the RAX register, but if it's big or
> both-unmovable-and-uncopyable, then the "return slot" will be passed
> in the RCX register.
>
> Since C++23 doesn't allow us to write an "interceptor" function as I
> described above, I had to instead code it in x86_64 assembler. Here's
> how I wrote the interceptor for "OpenPipe":
>
> OpenPipe:
> push_all_registers ; save all register values
> lea mystring(%rip), %rcx ; set 1st parameter to string
> call Record_Function_Call_And_Get_Function_Address
> pop_all_registers ; restore all register values
> jmp *%rax ; jump to the original function
> mystring:
> .byte 'O','p','e','n','P','i','p','e', 0
>
> The line "push_all_registers" is a macro that expands to:
>
> pushfq ; push the status register
> push %rbx
> push %rcx
> push %rdx
> ..... and so on
>
> The line "pop_all_registers" is a macro that expands to:
> . . . .
> pop %rdx
> pop %rcx
> pop %rbx
> popfq ; pop the status register
>
> So in the assembler function I wrote, I save the value of every
> register except for RAX (because I need to use RAX to store the
> address of the jump location), and then I restore all registers and
> restore the stack before jumping. I got this working, and the log file
> enabled me to see the order in which the functions were getting called
> . . . and ultimately I traced the problem down to "std::strchr"
> returning a nullptr that later got dereferenced.
>
> But the take-away from all this is:
> Instead of having to write interceptor functions in assembler, it
> would be nice if C++26 would let us write an interceptor as follows:
>
> auto Func(auto) interceptor
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> auto const pf = (void(*)(void))GetProcAddress(h, "FlushPipe");
> goto pf;
> }
>
> There could be more than one "goto" inside the interceptor function, such as:
>
> auto Func(auto) interceptor
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> if ( ! h ) goto std::abort;
> auto const pf = (void(*)(void))GetProcAddress(h, "FlushPipe");
> if ( ! pf ) goto std::abort;
> goto pf;
> }
>
> I would put two rules on interceptor functions:
>
> (1) An exception must not propagate outside of an interceptor before
> the jump takes place. If this occurs then either (1.a) The terminate
> handler is called, or (1.b) undefined behaviour.
>
> (2) If control reaches the end of an interceptor function without a
> "goto" statement, then it's as though the interceptor ended with "goto
> std::abort".
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2024-07-26 16:37:21