Date: Sat, 11 Apr 2026 23:10:47 +0100
I typically don't top-post, but the original post below is very long
so I'm top-posting.
With regard to the idea I presented in my below email to this mailing
list two years ago, I've been thinking about the easiest way to
implement it (for example in the GNU g++ compiler).
In a few compilers, we already have the function attribute
[[musttail]], which means that the return statement must be a tail
call -- which has the added assurance that the stack won't have extra
stuff pushed onto it when the jump occurs, and this is _exactly_ what
we need for interceptor functions.
So if you consider the following interceptor function:
[[interceptor]] void Interceptor(void)
{
WriteLog( "Function called 'FlushPipe'");
auto const h = LoadLibraryA("monkey.dll");
auto const pf = GetProcAddress(h, "FlushPipe");
goto -> pf;
}
Well under the hood it would work something like this:
[[musttail]] void Interceptor(void)
{
__builtin_push_all_argument_registers_onto_stack(); // e.g.
rdi, rsi, rcx, rdx, r8, r9
WriteLog( "Function called 'FlushPipe'");
auto const h = LoadLibraryA("monkey.dll");
auto const pf = GetProcAddress(h, "FlushPipe");
return __builtin_pop_all_arguments_off_stack_and_perform_tailcall( pf );
}
So then I'd have to code two new built-in's:
__builtin_push_all_argument_registers_onto_stack
__builtin_pop_all_arguments_off_stack_and_perform_tailcall
And I might be able to adapt the pre-existing function
"__builtin_apply_args" so that I can patch the g++ compiler for every
instruction set (not just for x86_64).
I might start off though by just coding this feature for x86_64 System
V Linux, see if it works properly with every bizarre kind of function
call (e.g. more than 6 arguments, some floating point arguments, and
the returned object goes in a slot (or in RAX+RDX)).
On Fri, Jul 26, 2024 at 11:22 AM Frederick Virchanza Gotham wrote:
>
> Let's say we have gotten a proprietary library file from somewhere,
> it's built in Release mode, stripped of all its symbols, and the only
> info we have on it is the accompanying header file. Let's say one of
> the functions in the header file is:
>
> extern int FlushPipe(void *pipe, int flags);
>
> If this library file is used as part of a very big project, and if
> we're trying to figure out what's causing the program to crash, we can
> write a "man in the middle" library that intercepts these function
> calls, logs them, and then calls the original function, something
> like:
>
> typedef int (*FuncPtr_FlushPipe)(void *pipe, int flags);
>
> template<typename... Params>
> decltype(auto) FlushPipe(Params&&... args)
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> auto const pf = (FuncPtr_FlushPipe)GetProcAddress(h, "FlushPipe");
> return pf( static_cast<Params&&>(args)... );
> }
>
> The above code snippet will work fine if we can put it into a header
> file that will be compiled into the program. But sometimes we can't
> recompile a proprietary library, so we need to actually create a "man
> in the middle" library that exports all the same functions as the
> original library. So then our "man in the middle" library would export
> the following function:
>
> typedef int (*FuncPtr_FlushPipe)(void *pipe, int flags);
>
> extern "C" int FlushPipe(void *const pipe, int const flags)
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> auto const pf = (FuncPtr_FlushPipe)GetProcAddress(h, "FlushPipe");
> return pf( pipe, flags );
> }
>
> Two weeks ago in my work, I was assigned a bug that was more
> complicated than this, and what I really needed in C++ was an
> "interceptor function", which would look something like this:
>
> auto Func(auto) interceptor
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> auto const pf = (void(*)(void))GetProcAddress(h, "FlushPipe");
> goto pf;
> }
>
> The above function marked as an "interceptor" always ends with a jump
> to another function. Up until the jump takes place, it can alter the
> registers and the stack however it pleases, however it must restore
> all registers and the stack before jumping to the other function.
>
> The bug I was investigating at work -- which took me 10 working days
> to figure out -- involved one massive program (as massive as your
> favourite word processor) running on MS-Windows x86_64, and the crash
> was caused by the interaction with 5 DLL files that also interact with
> each other. I tried my best using debugger tools, but since all the
> DLL's had their symbols stripped, it was slow progress. I disassembled
> portions of the proprietary DLL's but still didn't make much progress
> reading through the instructions. I replaced the main DLL with a "man
> in the middle" as described above. I made a little progress doing
> this, but still needed more info. So next I replaced all 5 DLL files
> with a "man in the middle" -- which was a little tricky because some
> of the DLL's were proprietary and I didn't have the header file.
>
> Let's say the proprietary DLL is called "monkey.dll", and so we use
> Dependency Walker or 'dumpbin' to get a list of all exported functions
> as follows:
>
> OpenPipe
> FlushPipe
> ReadPipe
> WritePipe
>
> These 4 names are not mangled, and as we don't have the accompanying
> header file, we don't know what these functions' parameters are, and
> we don't know what the return type is either. Even if they were
> mangled, the mangled name might not contain the return type (e.g. the
> GNU compiler mangling omits the return type). If the return type is
> small, it will simply go in the RAX register, but if it's big or
> both-unmovable-and-uncopyable, then the "return slot" will be passed
> in the RCX register.
>
> Since C++23 doesn't allow us to write an "interceptor" function as I
> described above, I had to instead code it in x86_64 assembler. Here's
> how I wrote the interceptor for "OpenPipe":
>
> OpenPipe:
> push_all_registers ; save all register values
> lea mystring(%rip), %rcx ; set 1st parameter to string
> call Record_Function_Call_And_Get_Function_Address
> pop_all_registers ; restore all register values
> jmp *%rax ; jump to the original function
> mystring:
> .byte 'O','p','e','n','P','i','p','e', 0
>
> The line "push_all_registers" is a macro that expands to:
>
> pushfq ; push the status register
> push %rbx
> push %rcx
> push %rdx
> ..... and so on
>
> The line "pop_all_registers" is a macro that expands to:
> . . . .
> pop %rdx
> pop %rcx
> pop %rbx
> popfq ; pop the status register
>
> So in the assembler function I wrote, I save the value of every
> register except for RAX (because I need to use RAX to store the
> address of the jump location), and then I restore all registers and
> restore the stack before jumping. I got this working, and the log file
> enabled me to see the order in which the functions were getting called
> . . . and ultimately I traced the problem down to "std::strchr"
> returning a nullptr that later got dereferenced.
>
> But the take-away from all this is:
> Instead of having to write interceptor functions in assembler, it
> would be nice if C++26 would let us write an interceptor as follows:
>
> auto Func(auto) interceptor
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> auto const pf = (void(*)(void))GetProcAddress(h, "FlushPipe");
> goto pf;
> }
>
> There could be more than one "goto" inside the interceptor function, such as:
>
> auto Func(auto) interceptor
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> if ( ! h ) goto std::abort;
> auto const pf = (void(*)(void))GetProcAddress(h, "FlushPipe");
> if ( ! pf ) goto std::abort;
> goto pf;
> }
>
> I would put two rules on interceptor functions:
>
> (1) An exception must not propagate outside of an interceptor before
> the jump takes place. If this occurs then either (1.a) The terminate
> handler is called, or (1.b) undefined behaviour.
>
> (2) If control reaches the end of an interceptor function without a
> "goto" statement, then it's as though the interceptor ended with "goto
> std::abort".
so I'm top-posting.
With regard to the idea I presented in my below email to this mailing
list two years ago, I've been thinking about the easiest way to
implement it (for example in the GNU g++ compiler).
In a few compilers, we already have the function attribute
[[musttail]], which means that the return statement must be a tail
call -- which has the added assurance that the stack won't have extra
stuff pushed onto it when the jump occurs, and this is _exactly_ what
we need for interceptor functions.
So if you consider the following interceptor function:
[[interceptor]] void Interceptor(void)
{
WriteLog( "Function called 'FlushPipe'");
auto const h = LoadLibraryA("monkey.dll");
auto const pf = GetProcAddress(h, "FlushPipe");
goto -> pf;
}
Well under the hood it would work something like this:
[[musttail]] void Interceptor(void)
{
__builtin_push_all_argument_registers_onto_stack(); // e.g.
rdi, rsi, rcx, rdx, r8, r9
WriteLog( "Function called 'FlushPipe'");
auto const h = LoadLibraryA("monkey.dll");
auto const pf = GetProcAddress(h, "FlushPipe");
return __builtin_pop_all_arguments_off_stack_and_perform_tailcall( pf );
}
So then I'd have to code two new built-in's:
__builtin_push_all_argument_registers_onto_stack
__builtin_pop_all_arguments_off_stack_and_perform_tailcall
And I might be able to adapt the pre-existing function
"__builtin_apply_args" so that I can patch the g++ compiler for every
instruction set (not just for x86_64).
I might start off though by just coding this feature for x86_64 System
V Linux, see if it works properly with every bizarre kind of function
call (e.g. more than 6 arguments, some floating point arguments, and
the returned object goes in a slot (or in RAX+RDX)).
On Fri, Jul 26, 2024 at 11:22 AM Frederick Virchanza Gotham wrote:
>
> Let's say we have gotten a proprietary library file from somewhere,
> it's built in Release mode, stripped of all its symbols, and the only
> info we have on it is the accompanying header file. Let's say one of
> the functions in the header file is:
>
> extern int FlushPipe(void *pipe, int flags);
>
> If this library file is used as part of a very big project, and if
> we're trying to figure out what's causing the program to crash, we can
> write a "man in the middle" library that intercepts these function
> calls, logs them, and then calls the original function, something
> like:
>
> typedef int (*FuncPtr_FlushPipe)(void *pipe, int flags);
>
> template<typename... Params>
> decltype(auto) FlushPipe(Params&&... args)
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> auto const pf = (FuncPtr_FlushPipe)GetProcAddress(h, "FlushPipe");
> return pf( static_cast<Params&&>(args)... );
> }
>
> The above code snippet will work fine if we can put it into a header
> file that will be compiled into the program. But sometimes we can't
> recompile a proprietary library, so we need to actually create a "man
> in the middle" library that exports all the same functions as the
> original library. So then our "man in the middle" library would export
> the following function:
>
> typedef int (*FuncPtr_FlushPipe)(void *pipe, int flags);
>
> extern "C" int FlushPipe(void *const pipe, int const flags)
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> auto const pf = (FuncPtr_FlushPipe)GetProcAddress(h, "FlushPipe");
> return pf( pipe, flags );
> }
>
> Two weeks ago in my work, I was assigned a bug that was more
> complicated than this, and what I really needed in C++ was an
> "interceptor function", which would look something like this:
>
> auto Func(auto) interceptor
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> auto const pf = (void(*)(void))GetProcAddress(h, "FlushPipe");
> goto pf;
> }
>
> The above function marked as an "interceptor" always ends with a jump
> to another function. Up until the jump takes place, it can alter the
> registers and the stack however it pleases, however it must restore
> all registers and the stack before jumping to the other function.
>
> The bug I was investigating at work -- which took me 10 working days
> to figure out -- involved one massive program (as massive as your
> favourite word processor) running on MS-Windows x86_64, and the crash
> was caused by the interaction with 5 DLL files that also interact with
> each other. I tried my best using debugger tools, but since all the
> DLL's had their symbols stripped, it was slow progress. I disassembled
> portions of the proprietary DLL's but still didn't make much progress
> reading through the instructions. I replaced the main DLL with a "man
> in the middle" as described above. I made a little progress doing
> this, but still needed more info. So next I replaced all 5 DLL files
> with a "man in the middle" -- which was a little tricky because some
> of the DLL's were proprietary and I didn't have the header file.
>
> Let's say the proprietary DLL is called "monkey.dll", and so we use
> Dependency Walker or 'dumpbin' to get a list of all exported functions
> as follows:
>
> OpenPipe
> FlushPipe
> ReadPipe
> WritePipe
>
> These 4 names are not mangled, and as we don't have the accompanying
> header file, we don't know what these functions' parameters are, and
> we don't know what the return type is either. Even if they were
> mangled, the mangled name might not contain the return type (e.g. the
> GNU compiler mangling omits the return type). If the return type is
> small, it will simply go in the RAX register, but if it's big or
> both-unmovable-and-uncopyable, then the "return slot" will be passed
> in the RCX register.
>
> Since C++23 doesn't allow us to write an "interceptor" function as I
> described above, I had to instead code it in x86_64 assembler. Here's
> how I wrote the interceptor for "OpenPipe":
>
> OpenPipe:
> push_all_registers ; save all register values
> lea mystring(%rip), %rcx ; set 1st parameter to string
> call Record_Function_Call_And_Get_Function_Address
> pop_all_registers ; restore all register values
> jmp *%rax ; jump to the original function
> mystring:
> .byte 'O','p','e','n','P','i','p','e', 0
>
> The line "push_all_registers" is a macro that expands to:
>
> pushfq ; push the status register
> push %rbx
> push %rcx
> push %rdx
> ..... and so on
>
> The line "pop_all_registers" is a macro that expands to:
> . . . .
> pop %rdx
> pop %rcx
> pop %rbx
> popfq ; pop the status register
>
> So in the assembler function I wrote, I save the value of every
> register except for RAX (because I need to use RAX to store the
> address of the jump location), and then I restore all registers and
> restore the stack before jumping. I got this working, and the log file
> enabled me to see the order in which the functions were getting called
> . . . and ultimately I traced the problem down to "std::strchr"
> returning a nullptr that later got dereferenced.
>
> But the take-away from all this is:
> Instead of having to write interceptor functions in assembler, it
> would be nice if C++26 would let us write an interceptor as follows:
>
> auto Func(auto) interceptor
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> auto const pf = (void(*)(void))GetProcAddress(h, "FlushPipe");
> goto pf;
> }
>
> There could be more than one "goto" inside the interceptor function, such as:
>
> auto Func(auto) interceptor
> {
> WriteLog( "Function called 'FlushPipe'");
> auto const h = LoadLibraryA("monkey.dll");
> if ( ! h ) goto std::abort;
> auto const pf = (void(*)(void))GetProcAddress(h, "FlushPipe");
> if ( ! pf ) goto std::abort;
> goto pf;
> }
>
> I would put two rules on interceptor functions:
>
> (1) An exception must not propagate outside of an interceptor before
> the jump takes place. If this occurs then either (1.a) The terminate
> handler is called, or (1.b) undefined behaviour.
>
> (2) If control reaches the end of an interceptor function without a
> "goto" statement, then it's as though the interceptor ended with "goto
> std::abort".
Received on 2026-04-11 22:11:01
