ISOCPP std-proposals List: Re: [std-proposals] Interceptor Function (preserve stack and all registers)

From: Thiago Macieira <thiago_at_[hidden]>
Date: Fri, 02 Aug 2024 07:19:13 -0700

On Friday 2 August 2024 04:29:11 GMT-7 Frederick Virchanza Gotham via Std-
Proposals wrote:
> 1) Create a library called "libmonkey.so" or "monkey.dll" that exports
> 6 unmangled functions
> 2) Create an executable program "prog" or "prog.exe" and link it with
> the Monkey library (e.g. g++ -o prog prog.cpp -lmonkey)
> 3) Now rename the Monkey library to "libmonkey.so.original"
> 4) Get a list of all symbols in the Monkey library with: nm -D
> libmonkey.so.original | grep -i " T "
> 5) Create a 'man in the middle' library file called 'libmonkey.so'
> containing interceptors for the 6 symbols
> 6) Run the executable program

Thank you. Now explain this in "standardese" and add to your paper.

You should also address the usefulness of the feature (or lack thereof) for
platforms that don't have shared libraries, or for situations in which the
content you need to intercept is provided as a static library.

> Interceptors are useful when dealing with pre-compiled libraries
> stripped of their symbols (which you might not even have the header
> file for).

I understand that but that's a VERY restricted use and doesn't address my
point. Please address why this feature is useful when we already have other
techniques. Why would someone use one instead of the other? Is this meant for
production use?

Also, would it be used at all when the headers are available? Are there better
options for when they are? Speaking of headers, with C++ mangled symbols, you
can tell what the arguments and often the return type are: is the feature
useful there too?

> Yeah but there isn't a hell of a lot a programmer can do when
> debugging a program linked with a proprietary library they know
> nothing about. Especially if the function names aren't mangles and
> they don't have the header file. Using "objdump" or "Dependency
> Walker", they can see that there's an exported function called
> "OpenPipe" but they don't know the return type nor the parameter
> types. This is where you need an interceptor that preserves the stack
> and the registers.

This argument is not addressing my comment of requiring the runtime (which is
not part of the library you're trying to debug) to provide the shadow stack.

And speaking of Intel CET, you need to show whether the interceptor works with
the technology. Because if it doesn't, then you can really only use it in
debug environments. But I think that this is where you want the feature
anyway.

> I haven't so far been able to preserve every single register. I need
> at least one register for the jump location, and the best choice for
> this on x86_32 is ECX, while on x86_64, the best choice is R11. But
> you could have a bizarre program that has its own calling convention
> with its own shared libraries -- for example a calling convention in
> which the return value is put in R11, then the interceptor won't work.
> Of course though you could tweak the interceptor to use a different
> register.

Why does the C++ Standard need to deal with non-standard calling conventions?
Why dos a compiler need to deal with calling conventions it itself doesn't
support? Why does C++ need to deal with routines that aren't written in C++ or
are extern "C"?

Anyway, your answer is "any state", which means your possible implementation
is wrong and incomplete. You must save the SSE/AVX register state; on Windows,
some of those registers (or a portion of the register) is callee-save.

> When tailoring an interceptor to a particular calling convention,
> there are a few kinds of register preservation that must take place:
> Before the jump to the original function, we must preserve:
> A) All of the function argument registers
> B) The callee-saved registers we will use
>
> After the jump has taken place, we must preserve:
> C) All of the return-value registers
> B) The callee-saved registers we will use

I know that. So why is your code saving more than it needs to? Why is it so
convoluted when it has scratch registers available that it can use?

> I'm actually thinking now that I'll write a function something like
> the following:
>
> auto WriteInterceptor( void (*Inward)(void*), void
> (*Original)(void), void (*Outward)(void*), void *arg_to_forward ) ->
> void(*)(void)
>
> You supply this function with 4 arguments:
> 1) The address of the "Inward" code (e.g. a function that locks a mutex)
> 2) The address of the original function
> 3) The address of the "Outward" code (e.g. a function that unlocks a mutex)
> 4) User-defined data to be passed in Step 1 and Step 3
>
> This function will write the machine code for the interceptor (so it
> will allocate a page of memory, write the machine code and mark the
> page as executable), and return a pointer to the interceptor.

So it's now JIT and it reduces even further the targets it can run on. There
are some hyper-secure environments that don't allow JITting. And how is it
going to intercept calls to Original without knowing where the calls to it
are?

Anyway, I really want you to reason about why this needs to be in the
Standard. Work on the Motivation section. Please explore the options that
exist with debuggers and tracing tools like ltrace, and explain why this
feature is needed when those others already exist. Explain the conditions
under which the feature can work at all and therefore the conditions where it
can't; in doing so, please reason how often this is going to be needed. And
finally, please reason why this should be supported for the next couple of
decades in a standard fashion.

Or don't. How many times in your professional career have you needed this? I
think your chance of getting this paper even past the CWG is less than
FLT16_EPSILON. So do you want to spend even more time on it and the
committee's time?

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Principal Engineer - Intel DCAI Platform & System Engineering

Received on 2024-08-02 14:19:23