Date: Fri, 2 Aug 2024 12:29:11 +0100
On Thu, Aug 1, 2024 at 7:59 PM Thiago Macieira wrote:
>
> Please describe how such functionality could be used, using only the Standard
> Library or with content you provide in the paper.
>
> Hint: you can't.
> You're not explaining how the interceptor function gets installed. It appears
> to be automatic by just having it declared, which is obviously not the case.
> There needs to be more steps to make it happen.
1) Create a library called "libmonkey.so" or "monkey.dll" that exports
6 unmangled functions
2) Create an executable program "prog" or "prog.exe" and link it with
the Monkey library (e.g. g++ -o prog prog.cpp -lmonkey)
3) Now rename the Monkey library to "libmonkey.so.original"
4) Get a list of all symbols in the Monkey library with: nm -D
libmonkey.so.original | grep -i " T "
5) Create a 'man in the middle' library file called 'libmonkey.so'
containing interceptors for the 6 symbols
6) Run the executable program
> In your motivation section, please explain why this should be in the standard,
> not as a functionality provided by the toolchain and debugger. If I wanted to
> log all calls to a specific function, besides ltrace I have already mentioned,
> it would be easy to put a breakpoint in the function in question and run some
> debugger script to print what I needed. Given that those things exist, please
> improve your motivation with additional reasons beyond what's there.
Interceptors are useful when dealing with pre-compiled libraries
stripped of their symbols (which you might not even have the header
file for).
> I don't think your solution for reentrancy is a good one. I recommend instead
> requiring the runtime to allocate a per-thread shadow stack (like the Indirect
> Branch Tracking feature of Intel CET requires) and push data into it. This
> removes the need for the developer to estimate anything and pushes the problem
> to the implementation, which can possibly also be tuned at runtime. Further,
> this improves space utilisation because then multiple intercepting functions
> in the same thread can share the space, instead of each one having reserved
> its worst case scenario. And I don't think the requirement to update the
> runtime would be a deal breaker, since we are talking about a debugging
> feature. Developers debug using modern tools.
Yeah but there isn't a hell of a lot a programmer can do when
debugging a program linked with a proprietary library they know
nothing about. Especially if the function names aren't mangles and
they don't have the header file. Using "objdump" or "Dependency
Walker", they can see that there's an exported function called
"OpenPipe" but they don't know the return type nor the parameter
types. This is where you need an interceptor that preserves the stack
and the registers.
> Your possible implementation for x86 does both too much and too little. Why
> are you preserving some non-preserved registers and not others? Either you
> save the entire context (using XSAVE, which requires dynamic buffer sizes and
> CPUID detection) or you limit yourself to what the ABI requires. This is not
> merely an implementation-defined behaviour because it impinges on whether you
> can intercept any function (or for that matter any instruction) or whether you
> can only intercept at function call boundaries that respect ABI. That is, is
> this feature expected to be able to intercept calls between two different
> routines written in assembly that don't obey the ABI?
I haven't so far been able to preserve every single register. I need
at least one register for the jump location, and the best choice for
this on x86_32 is ECX, while on x86_64, the best choice is R11. But
you could have a bizarre program that has its own calling convention
with its own shared libraries -- for example a calling convention in
which the return value is put in R11, then the interceptor won't work.
Of course though you could tweak the interceptor to use a different
register.
> If you do turn this into interception of ABI-compliant function calls, then
> you should add some non-normative words about how a compiler would implement
> interception of different ABIs, as an extension. That is, for Windows 32-bit
> i386, interception of __cdecl, __thiscall, __stdcall and __syscall.
When tailoring an interceptor to a particular calling convention,
there are a few kinds of register preservation that must take place:
Before the jump to the original function, we must preserve:
A) All of the function argument registers
B) The callee-saved registers we will use
After the jump has taken place, we must preserve:
C) All of the return-value registers
B) The callee-saved registers we will use
So for example on x86_64 System V, that's:
A) RDI, RSI, RDX, RCX, R8, R9 (and R10, I think)
B) Any of RBX, RSP, RBP, and R12–R15
C) RAX, RDX
So the interceptor's control flow on x86_64 goes something like:
push RDI, RSI, RDX, RCX, R8, R9, R10
push R12 (if we need it for temp storage)
call LockMutex
pop R12
pop RDI, RSI, RDX, RCX, R8, R9, R10 (in backwards order)
manipulate_return_address_on_stack
jmp OriginalFunction
push RAX, RDX (save the return values)
push R12 (if we need it for temp storage)
call UnlockMutex
pop R12
pop RAX, RDX (in backwards order)
jmp OriginalReturnAddress (using R11)
> I think you got carried away with what you could do and didn't pay attention
> to what you needed to do.
>
> Given the lack of means to install this, the impossibility of describing the
> restricted conditions under which it applies, the fragility of the solution
> and the availability of much less fragile means for most work, I don't think
> this belongs in the Standard.
I'm actually thinking now that I'll write a function something like
the following:
auto WriteInterceptor( void (*Inward)(void*), void
(*Original)(void), void (*Outward)(void*), void *arg_to_forward ) ->
void(*)(void)
You supply this function with 4 arguments:
1) The address of the "Inward" code (e.g. a function that locks a mutex)
2) The address of the original function
3) The address of the "Outward" code (e.g. a function that unlocks a mutex)
4) User-defined data to be passed in Step 1 and Step 3
This function will write the machine code for the interceptor (so it
will allocate a page of memory, write the machine code and mark the
page as executable), and return a pointer to the interceptor.
>
> Please describe how such functionality could be used, using only the Standard
> Library or with content you provide in the paper.
>
> Hint: you can't.
> You're not explaining how the interceptor function gets installed. It appears
> to be automatic by just having it declared, which is obviously not the case.
> There needs to be more steps to make it happen.
1) Create a library called "libmonkey.so" or "monkey.dll" that exports
6 unmangled functions
2) Create an executable program "prog" or "prog.exe" and link it with
the Monkey library (e.g. g++ -o prog prog.cpp -lmonkey)
3) Now rename the Monkey library to "libmonkey.so.original"
4) Get a list of all symbols in the Monkey library with: nm -D
libmonkey.so.original | grep -i " T "
5) Create a 'man in the middle' library file called 'libmonkey.so'
containing interceptors for the 6 symbols
6) Run the executable program
> In your motivation section, please explain why this should be in the standard,
> not as a functionality provided by the toolchain and debugger. If I wanted to
> log all calls to a specific function, besides ltrace I have already mentioned,
> it would be easy to put a breakpoint in the function in question and run some
> debugger script to print what I needed. Given that those things exist, please
> improve your motivation with additional reasons beyond what's there.
Interceptors are useful when dealing with pre-compiled libraries
stripped of their symbols (which you might not even have the header
file for).
> I don't think your solution for reentrancy is a good one. I recommend instead
> requiring the runtime to allocate a per-thread shadow stack (like the Indirect
> Branch Tracking feature of Intel CET requires) and push data into it. This
> removes the need for the developer to estimate anything and pushes the problem
> to the implementation, which can possibly also be tuned at runtime. Further,
> this improves space utilisation because then multiple intercepting functions
> in the same thread can share the space, instead of each one having reserved
> its worst case scenario. And I don't think the requirement to update the
> runtime would be a deal breaker, since we are talking about a debugging
> feature. Developers debug using modern tools.
Yeah but there isn't a hell of a lot a programmer can do when
debugging a program linked with a proprietary library they know
nothing about. Especially if the function names aren't mangles and
they don't have the header file. Using "objdump" or "Dependency
Walker", they can see that there's an exported function called
"OpenPipe" but they don't know the return type nor the parameter
types. This is where you need an interceptor that preserves the stack
and the registers.
> Your possible implementation for x86 does both too much and too little. Why
> are you preserving some non-preserved registers and not others? Either you
> save the entire context (using XSAVE, which requires dynamic buffer sizes and
> CPUID detection) or you limit yourself to what the ABI requires. This is not
> merely an implementation-defined behaviour because it impinges on whether you
> can intercept any function (or for that matter any instruction) or whether you
> can only intercept at function call boundaries that respect ABI. That is, is
> this feature expected to be able to intercept calls between two different
> routines written in assembly that don't obey the ABI?
I haven't so far been able to preserve every single register. I need
at least one register for the jump location, and the best choice for
this on x86_32 is ECX, while on x86_64, the best choice is R11. But
you could have a bizarre program that has its own calling convention
with its own shared libraries -- for example a calling convention in
which the return value is put in R11, then the interceptor won't work.
Of course though you could tweak the interceptor to use a different
register.
> If you do turn this into interception of ABI-compliant function calls, then
> you should add some non-normative words about how a compiler would implement
> interception of different ABIs, as an extension. That is, for Windows 32-bit
> i386, interception of __cdecl, __thiscall, __stdcall and __syscall.
When tailoring an interceptor to a particular calling convention,
there are a few kinds of register preservation that must take place:
Before the jump to the original function, we must preserve:
A) All of the function argument registers
B) The callee-saved registers we will use
After the jump has taken place, we must preserve:
C) All of the return-value registers
B) The callee-saved registers we will use
So for example on x86_64 System V, that's:
A) RDI, RSI, RDX, RCX, R8, R9 (and R10, I think)
B) Any of RBX, RSP, RBP, and R12–R15
C) RAX, RDX
So the interceptor's control flow on x86_64 goes something like:
push RDI, RSI, RDX, RCX, R8, R9, R10
push R12 (if we need it for temp storage)
call LockMutex
pop R12
pop RDI, RSI, RDX, RCX, R8, R9, R10 (in backwards order)
manipulate_return_address_on_stack
jmp OriginalFunction
push RAX, RDX (save the return values)
push R12 (if we need it for temp storage)
call UnlockMutex
pop R12
pop RAX, RDX (in backwards order)
jmp OriginalReturnAddress (using R11)
> I think you got carried away with what you could do and didn't pay attention
> to what you needed to do.
>
> Given the lack of means to install this, the impossibility of describing the
> restricted conditions under which it applies, the fragility of the solution
> and the availability of much less fragile means for most work, I don't think
> this belongs in the Standard.
I'm actually thinking now that I'll write a function something like
the following:
auto WriteInterceptor( void (*Inward)(void*), void
(*Original)(void), void (*Outward)(void*), void *arg_to_forward ) ->
void(*)(void)
You supply this function with 4 arguments:
1) The address of the "Inward" code (e.g. a function that locks a mutex)
2) The address of the original function
3) The address of the "Outward" code (e.g. a function that unlocks a mutex)
4) User-defined data to be passed in Step 1 and Step 3
This function will write the machine code for the interceptor (so it
will allocate a page of memory, write the machine code and mark the
page as executable), and return a pointer to the interceptor.
Received on 2024-08-02 11:29:26