sg7: Re: [SG7] Thompson Turing lecture

From: David Rector <davrec_at_[hidden]>
Date: Sun, 16 May 2021 20:16:24 -0400

> On May 15, 2021, at 9:38 PM, Isabella Muerte via SG7 <sg7_at_[hidden]> wrote:
>
> I would just like to mention that with clang and gcc plugins *anything* is possible as well, and even trusting that your build system is printing out the correct command that it is executing (even if its defined in your Makefile or build.ninja) is also a level of implicit trust that we all currently exhibit in our builds. There is, quite honestly, very few things stopping a make implementation or ninja installation from detecting your compiler, injecting something with -plugin, and printing out something different. On some platforms it's possible to detect what programs might be watching or logging behavior through tools like detours, ptrace, or even mach ports, which allows for hiding this behavior under observation. Some folks also just inject assembly that is never executed, but will cause disassemblers to crash.
>
> Right now the ability for Rust proc-macros[^1] to "execute anything" is a known issue (Mara Bos, the maintainer of wg21.link <http://wg21.link/>, for instance showed a "SFINAE in Rust" tweet where she just calls fork() on the compiler to find the correct code that compiles out of a given set of statements.
>
> Specifications like WASM (not WASI[^2], which gives access to certain system level resources) do a lot to alleviate this. There are hard restrictions on WASM that can be further reduced by the host's choice and escaping WASM is much harder to do (especially if there is no JIT, but instead a simple bytecode interpreter). There's also some literature[1] on static analysis of stack based VM instruction sets that show someone can statically analyze how valid a given instruction sequence is, and if it violates anything the host might consider to be incorrect, invalid, or undefined behavior[^3].
>
> Rust's MIR interpreter[2] predates WASM 1.0, and if it were to be retrofitted to support it, it would have to have MIR intrinsics due to the additional use of metadata and behavior it tries to diagnose. However, the only thing truly stopping Rust plugins from compiling to WASM is the lack of the reference types[3] and function-ref papers added to the WASM specification. It could most likely end up within the Rust compiler itself at some point as it would only be an ABI break, not an API break.
>
> That said, having spoken with Mara and @eddyb from github (who is effectively the author of the protocol for which proc-macros can speak with the rust compiler), the Rust compiler team is actually working to lock some of these things down, however the core compiler team is only a bit larger than the team working on MSVC.
>
> Regarding reflection, even if we specify it in such a way that it cannot do arbitrary things, as long as compilers have a mechanism to communicate with a process outside, anything is possible. Trying to design it to be as sandboxed as possible is, quite honestly, like trying to solve the piracy issue in game development.
>
> [1]: "Virtual Machines" *Iain D. Craig *ISBN-13: 978-1849969802
> [2]: https://github.com/rust-lang/miri <https://github.com/rust-lang/miri>
> [3]: https://github.com/WebAssembly/reference-types <https://github.com/WebAssembly/reference-types>
>
> [^1]: Not to be confused with the hygenic macro syntax, proc-macros are compiled as a plugin against the Rust compiler. Hygenic macros do not suffer from this issue, as they only operate on the AST with a custom syntax.
>
> [^2]: In part based off of the work done on CloudABI, to the point that when WASI was announced some of the documentation was copied word for word, though the CloudABI people didn't make a fuss about it. WASI allows for things like file I/O and networking, however its specified in such a way that even these can be sandboxed by a host if desired.
>
> [^3]: Iain Craig details how both the JVM and CLR (at the time of writing in 2005, things have changed since then of course) allow for analysis of allowed (but undesired) behavior. However I believe to date, no one has taken advantage of the work detailed in the book and finding the book itself is in fact very difficult these days, looking at prices on Amazon and elsewhere.

This is a really excellent analysis.

The point on plugins is particularly apt because plugins should probably be viewed alongside Circle etc. as competitors of reflection/metaprogramming standards.

Viewed this way, it feels like the insistence on consteval safety has left room for a lower level language of compile-time instructions, which is being filled by Circle, clang/gcc plugins, and perhaps Rust proc-macros.

I know substantial and necessary work is being done to make constant evaluation more efficient, at least in clang. But it seems to me there is only so much that can be done for large metafunctions. And I suspect the scale of things users will want to do with these new features (e.g. calling a huge metafunction on every member of a huge namespace) will quickly lay bare any inefficiencies of constant evaluation relative to any plugin-like alternative which allows calling already-compiled-and-optimized binaries at compile time, thereby driving users to these alternatives. (How much of a slowdown in build times will users be willing to put up with, day after day, in the name of “safety"/conformance?)

I have no expertise in this area but IIUC the solutions Isabella mentions seem promising: compiling metafunction libraries to WASM, augmenting WASM by requiring other static analyses to be performed during meta-compilation, etc. And her point that any vulnerabilities apparently introducible via proposed meta programming features are already present owing to the existence of compiler plugins seems unassailable. I would love to see this line of thinking explored further.

Received on 2021-05-16 19:16:31