C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Interceptor Function (preserve stack and all registers)

From: Marcin Jaczewski <marcinjaczewski86_at_[hidden]>
Date: Sun, 11 Aug 2024 15:38:39 +0200
pt., 2 sie 2024 o 16:19 Thiago Macieira via Std-Proposals
<std-proposals_at_[hidden]> napisaƂ(a):
>
> On Friday 2 August 2024 04:29:11 GMT-7 Frederick Virchanza Gotham via Std-
> Proposals wrote:
> > 1) Create a library called "libmonkey.so" or "monkey.dll" that exports
> > 6 unmangled functions
> > 2) Create an executable program "prog" or "prog.exe" and link it with
> > the Monkey library (e.g. g++ -o prog prog.cpp -lmonkey)
> > 3) Now rename the Monkey library to "libmonkey.so.original"
> > 4) Get a list of all symbols in the Monkey library with: nm -D
> > libmonkey.so.original | grep -i " T "
> > 5) Create a 'man in the middle' library file called 'libmonkey.so'
> > containing interceptors for the 6 symbols
> > 6) Run the executable program
>
> Thank you. Now explain this in "standardese" and add to your paper.
>
> You should also address the usefulness of the feature (or lack thereof) for
> platforms that don't have shared libraries, or for situations in which the
> content you need to intercept is provided as a static library.
>
> > Interceptors are useful when dealing with pre-compiled libraries
> > stripped of their symbols (which you might not even have the header
> > file for).
>
> I understand that but that's a VERY restricted use and doesn't address my
> point. Please address why this feature is useful when we already have other
> techniques. Why would someone use one instead of the other? Is this meant for
> production use?
>

I am not interested in "Interceptor" but `goto function;` could be useful.
Having guaranteed tail call in standard could allow many techniques
that are not available in C++ or are not portable (like computed `goto`).
Probably best example main interpreter loop:


https://godbolt.org/z/f6zdc756z
```
struct Cpu
{
    using Op = void (*)(Cpu&);

    unsigned char* code;
    int* mem;
    int off;

    unsigned char imm() { return *(code++); }
    int& refMem() { return *(mem + imm()); }
    int& refMemOff() { return *(mem + imm() + off); }

    Op nextFunc() { return ops[imm()]; }

    static void exit(Cpu&) {}
    static void loadOff(Cpu& c) { c.off = c.refMem(); c.nextFunc()(c); }
    static void loadOffImm(Cpu& c) { c.off = c.imm(); c.nextFunc()(c); }
    static void loadOffNext(Cpu& c) { c.off = c.refMemOff();
c.nextFunc()(c); }
    static void storeOff(Cpu& c) { c.refMem() = c.off; c.nextFunc()(c); }
    static void copy1(Cpu& c) { c.refMemOff() = c.refMem();
c.nextFunc()(c); }
    static void copy2(Cpu& c) { c.refMem() = c.refMemOff();
c.nextFunc()(c); }

    static void add(Cpu& c) { c.refMem() += c.refMemOff(); c.nextFunc()(c); }

    static void jump(Cpu& c) { if (c.off) { c.code += (char)c.imm(); }
else { (void)c.imm(); } c.nextFunc()(c); }



    constexpr static Op ops[] = {
       &Cpu::exit,
       &Cpu::loadOff,
       &Cpu::loadOffImm,
       &Cpu::loadOffNext,
       &Cpu::storeOff,
       &Cpu::copy1,
       &Cpu::copy2,
       &Cpu::add,
       &Cpu::jump,
    };
};




int main()
{
    int mem[] =
    {
        0,
        0,
        0,
        0,

        40,
        2,
        3,
        4,
    };

    unsigned char code[] =
    {
        /*add*/ 7, 4, 0,
        /*add*/ 7, 5, 0,
        /*loadOff*/ 1, 0,
        /*exit*/ 0,
    };

    Cpu c { code, mem, 0 };

    c.nextFunc()(c);

    return c.off;
}
```

This `Cpu` can make infinite loop and it will work fine,
but this is not portable and brittle as if any of function
have no trivial local variable then tall recursion is broken
and calling code will overflow the stack.

And if we had something like `goto &func;` it could work as
mechanism to forcing tail calls.
I would bit use diffrent syntax like:

```
return goto expr;
```

To make sure not to get confused with normal `goto`
or computed extensions. In some way its mix
of `return` and `goto`.

Basic behavior would be that result of expression should
be function with same signature as current function:

```
int foo(int i, int& j, void* p)
{
    return 42;
}

int bar(int i, int& k, void*)
{
    std::vector<int> v{ i, k };
    --i;
    return goto i < v.size() ? &foo : &bar;
}

int main()
{
    int k = 3;
    int r = bar(10, k, nullptr);
}
```

This is an effective loop, arguments are transferred directly to next function,
local variables are destroyed after evaluation of `return goto` but before
jumping to the next function. This mean even if we have vector used
in expression like `i < v.size()` the compiler is still allowed cleanup stack.

With tailcall whole overhead of each iteration is:

```
        movzx eax, BYTE PTR [rax+1]
        jmp [QWORD PTR Cpu::ops[0+rax*8]]
```

This is similar to computed `goto` but you are not limited to one
function body and labels in it.

This should work for member functions but with a couple of caveats.
Like you can't mix expect `this` functions with old ones as
both can have diffrent calling conventions.

```
struct F
{
    void foo() {}
    void bar() { return goto &F::foo; }
};
```

If we assume that in all ABI `this` pointer and references have
same calling convention and is always considered volatile we
could hack tailcall to allow calling functions with diffrent signature like:

```
return goto &x->foo;
```

As now `&x->foo` is illegal syntax, but we could give him meaning in
this context as "call `foo` with `x` as `this` pointer and same arguments
as current function".

We could add too shorthand:

```
return goto obj;
```

that mean

```
return goto &obj.operator();
```

because this will be useful for lambas:

```
auto foo = [&](this auto& self, int i) { };
auto bar = [&](this auto& self, int i) { if (i-- > 0) return goto
self; else return goto foo; };
```







> Also, would it be used at all when the headers are available? Are there better
> options for when they are? Speaking of headers, with C++ mangled symbols, you
> can tell what the arguments and often the return type are: is the feature
> useful there too?
>
> > Yeah but there isn't a hell of a lot a programmer can do when
> > debugging a program linked with a proprietary library they know
> > nothing about. Especially if the function names aren't mangles and
> > they don't have the header file. Using "objdump" or "Dependency
> > Walker", they can see that there's an exported function called
> > "OpenPipe" but they don't know the return type nor the parameter
> > types. This is where you need an interceptor that preserves the stack
> > and the registers.
>
> This argument is not addressing my comment of requiring the runtime (which is
> not part of the library you're trying to debug) to provide the shadow stack.
>
> And speaking of Intel CET, you need to show whether the interceptor works with
> the technology. Because if it doesn't, then you can really only use it in
> debug environments. But I think that this is where you want the feature
> anyway.
>
> > I haven't so far been able to preserve every single register. I need
> > at least one register for the jump location, and the best choice for
> > this on x86_32 is ECX, while on x86_64, the best choice is R11. But
> > you could have a bizarre program that has its own calling convention
> > with its own shared libraries -- for example a calling convention in
> > which the return value is put in R11, then the interceptor won't work.
> > Of course though you could tweak the interceptor to use a different
> > register.
>
> Why does the C++ Standard need to deal with non-standard calling conventions?
> Why dos a compiler need to deal with calling conventions it itself doesn't
> support? Why does C++ need to deal with routines that aren't written in C++ or
> are extern "C"?
>
> Anyway, your answer is "any state", which means your possible implementation
> is wrong and incomplete. You must save the SSE/AVX register state; on Windows,
> some of those registers (or a portion of the register) is callee-save.
>
> > When tailoring an interceptor to a particular calling convention,
> > there are a few kinds of register preservation that must take place:
> > Before the jump to the original function, we must preserve:
> > A) All of the function argument registers
> > B) The callee-saved registers we will use
> >
> > After the jump has taken place, we must preserve:
> > C) All of the return-value registers
> > B) The callee-saved registers we will use
>
> I know that. So why is your code saving more than it needs to? Why is it so
> convoluted when it has scratch registers available that it can use?
>
> > I'm actually thinking now that I'll write a function something like
> > the following:
> >
> > auto WriteInterceptor( void (*Inward)(void*), void
> > (*Original)(void), void (*Outward)(void*), void *arg_to_forward ) ->
> > void(*)(void)
> >
> > You supply this function with 4 arguments:
> > 1) The address of the "Inward" code (e.g. a function that locks a mutex)
> > 2) The address of the original function
> > 3) The address of the "Outward" code (e.g. a function that unlocks a mutex)
> > 4) User-defined data to be passed in Step 1 and Step 3
> >
> > This function will write the machine code for the interceptor (so it
> > will allocate a page of memory, write the machine code and mark the
> > page as executable), and return a pointer to the interceptor.
>
> So it's now JIT and it reduces even further the targets it can run on. There
> are some hyper-secure environments that don't allow JITting. And how is it
> going to intercept calls to Original without knowing where the calls to it
> are?
>
> Anyway, I really want you to reason about why this needs to be in the
> Standard. Work on the Motivation section. Please explore the options that
> exist with debuggers and tracing tools like ltrace, and explain why this
> feature is needed when those others already exist. Explain the conditions
> under which the feature can work at all and therefore the conditions where it
> can't; in doing so, please reason how often this is going to be needed. And
> finally, please reason why this should be supported for the next couple of
> decades in a standard fashion.
>
> Or don't. How many times in your professional career have you needed this? I
> think your chance of getting this paper even past the CWG is less than
> FLT16_EPSILON. So do you want to spend even more time on it and the
> committee's time?
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Principal Engineer - Intel DCAI Platform & System Engineering
>
>
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2024-08-11 13:38:57