C++ Logo


Advanced search

Re: [std-proposals] Function Pointer from Lambda with Captures

From: Edward Catmur <ecatmur_at_[hidden]>
Date: Sun, 16 Apr 2023 18:50:56 -0300
On Sun, 16 Apr 2023 at 13:14, Thiago Macieira via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> On Sunday, 16 April 2023 12:13:04 -03 Frederick Virchanza Gotham via Std-
> Proposals wrote:
> > Writing a 44-byte thunk with only two 64-Bit values substituted into it
> > is nowhere near as extreme as what they're doing in that paper. If I
> > wanted to do what they're doing in that paper, I would just invoke 'g++'
> > from my program to create a shared library and then I'd use 'dlopen' on
> > the shared library.
> Except that their proposal is well-researched to the point of finding
> obscure
> features, and known to work. It comes from people who have a lot of
> experience
> with assembly, ABIs, and compilers. By your own admission, you've just
> started
> learning assembly one month ago.
> So please accept free advice when given by people who have a lot more
> experience than you in this area.
> > When you make a function call on x86_64, the first six int/pointer
> I know *exactly* how function calls work on x86-64. My dayjob is not only
> knowing x86-64 at this level, but one and two levels below this level (uop/
> microarchitectural and the physical implementation). In fact, the reason I
> am
> aware of the SELinux problems I was describing is *because* I was working
> with
> the glibc/binutils team to add an optimisation to Linux/ELF that required
> a
> bit of JIT inside the dynamic loader, and ran afoul of protections.
> Fortunately, since their work was an optimisation, it can gracefully fail
> and
> fall back to the existing status quo.
> They haven't published the proposal yet, so I can't give you a link.
> > Thiago you mentioned that the stack is not executable. This is a
> > unnecessary restriction imposed by some operating systems on some
> > processes -- it is not a limit of computer science nor of the x86_64
> > CPU instruction set.
> Correct. It's not a requirement imposed by the CPU on x86_64, but it is a
> fact
> of life. There's no option inside the language to turn that off; it's a
> compiler option and each compiler has a different one. I don't know how
> other
> architectures work and whether they even allow this. See Edward's reply
> about
> M32R.
> Moreover, non-executable stacks are a security protection feature,
> preventing
> code injection attacks. You will not get anyone to accept turning that off
> for
> new code: the compiler options are retained to deal with old code that
> (ab)used the functionality. Here's my advice: don't try to argue this, it
> will
> lead people to dismiss your proposal and it becomes Dead On Arrival.
> Accept
> that the stack is non-executable and move on.
> In fact, accept that you cannot have write-and-execute pages at all,
> anywhere.
> The best you can do is mmap() a new, writable page, to which you write the
> new
> code, then you mprotect() it to make it read-only & executable. That means
> you
> have a minimum allocation of a page size (4096 on x86). If your thunk is
> 44
> bytes, then you have a 9200% overhead.

libffi has a nice optimization where it maps the same page at two different
addresses, one mapping writeable and the other executable. This is probably
even more of an issue for security, though.

> > If this 'thunk' feature were added to the Standard, it could be
> > implemented in two ways:
> > (A) The efficient way, exactly how I've implemented it
> > (B) The inefficient way, by having a thread_local function pointer
> I'm pretty sure that neither (A) nor (B) work. Your (A) requires JIT,
> which
> isn't available everywhere. Your (B) requires that only one such state be
> held
> per thread -- so what happens if you try to chain them?
> > If a given architecture is unwilling or unable to do A, then the
> > following is the inefficient alternative:
> Assume that a number indistinguishable from 100% is going to be unable to
> do
> (A), then rephrase your proposal with the benefit versus the actual cost
> it's
> going to have.
> > I think the only place where this inefficient implementation could fall
> > down is if you have a lambda defined inside a recursive function... but
> > you'd just need to make sure that the function pointer is invoked before
> > the function is re-entered (unless of course, upon re-entry, you
> accommodate
> > the function pointer not having been invoked yet).
> Do so in your proposal.

There's another technique that works as long as you're OK with having a
limit to the number of simultaneous thunks outstanding. Basically, instead
of a single thread-local data pointer you have a fixed-sized array of them,
and a same-sized array of function pointers to hand out to the
callback-based API. Demo: https://godbolt.org/z/cGra4Kjxn

Received on 2023-04-16 21:51:09