sg7: Re: [SG7] Thompson Turing lecture

From: David Rector <davrec_at_[hidden]>
Date: Tue, 18 May 2021 12:35:53 -0400

> On May 16, 2021, at 10:46 PM, Andrew Sutton <asutton.list_at_[hidden]> wrote:
>
> I'm pretty sure Herb is worried about standardizing a language feature that enshrines mistrust in and allows subverting translation, not what implementations do outside of that. Y'all are talking about what compilers might do above and beyond what the language requires.
>
> If you want precompiled constexpr functions and make them available to clients, then modules are your answer.
>
> I strongly suspect that this "lower level language" you refer to will essentially be reduced to functions whose definitions are not reachable. Everything else will be evaluable, and precompiled, optimized definitions will be exported by modules.

I am not an expert in the full span of compiler technologies implicated in this discussion, but: let us differentiate here between "optimizing" consteval functions and optimizing lowered code, to clarify why it intrigues me, despite safety concerns, to consider ways of compiling/calling particularly large metafunctions as binaries or something close to it, rather than via interpreters or however compilers currently handle constant evaluation.

Compiling a consteval function will "optimize" `2+3` to `5`, but that seems to be it, at least in the major compilers. E.g. as you and Wyatt et al point out in P1240, in no major compiler is subobject access optimized during constant evaluation. (Some detailed testing based on the P1240 example here: https://lists.llvm.org/pipermail/cfe-dev/2021-March/067809.html <https://lists.llvm.org/pipermail/cfe-dev/2021-March/067809.html>; see Richard’s response in the next message as well, which explains away GCC’s apparent huge advantage and notes improvements coming in clang.)

I am not aware of any further optimizations presently performed when compiling consteval functions to a module, in clang at least, though I suppose this could be done. Attributes, too, could perhaps be used to instruct the compiler to spend extra time somehow "optimizing" a given consteval function, but such optimizations would have to be invented from scratch.

Why is this important? It is not yet clear to me that constant evaluation done via interpretation or however it is currently done in each compiler will be able to efficiently perform the kind of heavy duty metaprogramming users will increasingly ask of it, relative to alternatives which can perform metaprogramming via binary code (like plugins and, IIUC, Circle). There is work being done on constant evaluation speed in clang at least, but will it be enough? If it is indeed viable, i.e. close enough in speed to binary alternatives, backslaps all around — that would be a huge achievement, and this discussion is largely moot.

Until then, it seems to me C++ has left room for these "lower level" (i.e. faster and more capable, but less safe) means of metaprogramming. The slow development of the standard features, while understandable, has magnified and led to some solid exploration of this room.

Imagine, for example, if clang were to introduce a stable API/ABI for their AST. If I understand Circle correctly, it would then be straightforward for the Circle folks to add support for this API — allowing the user to call clang functions (e.g. `void myMetaFunc(clang::Decl *)` during constant evaluation by passing reflections to them (`myMetaFunc(^foo);`). Basically plugins, but now callable on individual declarations via the language, rather than on an entire translation unit via command line options.

This would offer the user full reflection information (including expressions etc.), interfaceable from a mature API, arbitrary AST transformation capabilities, and all of this available via lightning fast calls to precompiled binary libraries wherever desired — and all this atop the other functionality Circle currently supplies. This would be a compelling metaprogramming-centric C++ offshoot. And indeed such a language would arguably adhere more closely to the "zen" of C++, in that it would give users as much rope as they wish to make their (compile-time) code ever more powerful and efficient, leaving safety as a matter of expertise, encapsulation etc.

To be sure, the current proposed approach is definitely cleaner and safer, particularly for injection, and so long as it is in the same ballpark in terms of efficiency and capability, it wins. But it needs to be committed to staying in that ballpark, lest the "room" remain and C++ offshoots become more viable.

>
> But what about non-modular code? I don't care. Our detours have taken reflection out of scope 23, so I'm content to design for the future only.
>
> As for analogy relating safe metaprogramming to preventing piracy... Preventing piracy requires basically zero trust everywhere. A language specification is not software.
>
> We need to understand where the language interacts with vendor concerns and design accordingly. This is why Circle doesn't work for C++. It doesn't, and won't ever, respect major vendor concerns. (Ask me how to design accordingly.)
>
> Anyhoo... I continue to await comments or questions on the status quo proposal.
>
>
> On Sun, May 16, 2021, 8:16 PM David Rector via SG7 <sg7_at_[hidden] <mailto:sg7_at_[hidden]>> wrote:
>
>
>> On May 15, 2021, at 9:38 PM, Isabella Muerte via SG7 <sg7_at_[hidden] <mailto:sg7_at_[hidden]>> wrote:
>>
>> I would just like to mention that with clang and gcc plugins *anything* is possible as well, and even trusting that your build system is printing out the correct command that it is executing (even if its defined in your Makefile or build.ninja) is also a level of implicit trust that we all currently exhibit in our builds. There is, quite honestly, very few things stopping a make implementation or ninja installation from detecting your compiler, injecting something with -plugin, and printing out something different. On some platforms it's possible to detect what programs might be watching or logging behavior through tools like detours, ptrace, or even mach ports, which allows for hiding this behavior under observation. Some folks also just inject assembly that is never executed, but will cause disassemblers to crash.
>>
>> Right now the ability for Rust proc-macros[^1] to "execute anything" is a known issue (Mara Bos, the maintainer of wg21.link <http://wg21.link/>, for instance showed a "SFINAE in Rust" tweet where she just calls fork() on the compiler to find the correct code that compiles out of a given set of statements.
>>
>> Specifications like WASM (not WASI[^2], which gives access to certain system level resources) do a lot to alleviate this. There are hard restrictions on WASM that can be further reduced by the host's choice and escaping WASM is much harder to do (especially if there is no JIT, but instead a simple bytecode interpreter). There's also some literature[1] on static analysis of stack based VM instruction sets that show someone can statically analyze how valid a given instruction sequence is, and if it violates anything the host might consider to be incorrect, invalid, or undefined behavior[^3].
>>
>> Rust's MIR interpreter[2] predates WASM 1.0, and if it were to be retrofitted to support it, it would have to have MIR intrinsics due to the additional use of metadata and behavior it tries to diagnose. However, the only thing truly stopping Rust plugins from compiling to WASM is the lack of the reference types[3] and function-ref papers added to the WASM specification. It could most likely end up within the Rust compiler itself at some point as it would only be an ABI break, not an API break.
>>
>> That said, having spoken with Mara and @eddyb from github (who is effectively the author of the protocol for which proc-macros can speak with the rust compiler), the Rust compiler team is actually working to lock some of these things down, however the core compiler team is only a bit larger than the team working on MSVC.
>>
>> Regarding reflection, even if we specify it in such a way that it cannot do arbitrary things, as long as compilers have a mechanism to communicate with a process outside, anything is possible. Trying to design it to be as sandboxed as possible is, quite honestly, like trying to solve the piracy issue in game development.
>>
>> [1]: "Virtual Machines" *Iain D. Craig *ISBN-13: 978-1849969802
>> [2]: https://github.com/rust-lang/miri <https://github.com/rust-lang/miri>
>> [3]: https://github.com/WebAssembly/reference-types <https://github.com/WebAssembly/reference-types>
>>
>> [^1]: Not to be confused with the hygenic macro syntax, proc-macros are compiled as a plugin against the Rust compiler. Hygenic macros do not suffer from this issue, as they only operate on the AST with a custom syntax.
>>
>> [^2]: In part based off of the work done on CloudABI, to the point that when WASI was announced some of the documentation was copied word for word, though the CloudABI people didn't make a fuss about it. WASI allows for things like file I/O and networking, however its specified in such a way that even these can be sandboxed by a host if desired.
>>
>> [^3]: Iain Craig details how both the JVM and CLR (at the time of writing in 2005, things have changed since then of course) allow for analysis of allowed (but undesired) behavior. However I believe to date, no one has taken advantage of the work detailed in the book and finding the book itself is in fact very difficult these days, looking at prices on Amazon and elsewhere.
>
>
> This is a really excellent analysis.
>
> The point on plugins is particularly apt because plugins should probably be viewed alongside Circle etc. as competitors of reflection/metaprogramming standards.
>
> Viewed this way, it feels like the insistence on consteval safety has left room for a lower level language of compile-time instructions, which is being filled by Circle, clang/gcc plugins, and perhaps Rust proc-macros.
>
> I know substantial and necessary work is being done to make constant evaluation more efficient, at least in clang. But it seems to me there is only so much that can be done for large metafunctions. And I suspect the scale of things users will want to do with these new features (e.g. calling a huge metafunction on every member of a huge namespace) will quickly lay bare any inefficiencies of constant evaluation relative to any plugin-like alternative which allows calling already-compiled-and-optimized binaries at compile time, thereby driving users to these alternatives. (How much of a slowdown in build times will users be willing to put up with, day after day, in the name of “safety"/conformance?)
>
> I have no expertise in this area but IIUC the solutions Isabella mentions seem promising: compiling metafunction libraries to WASM, augmenting WASM by requiring other static analyses to be performed during meta-compilation, etc. And her point that any vulnerabilities apparently introducible via proposed meta programming features are already present owing to the existence of compiler plugins seems unassailable. I would love to see this line of thinking explored further.
> --
> SG7 mailing list
> SG7_at_[hidden] <mailto:SG7_at_[hidden]>
> https://lists.isocpp.org/mailman/listinfo.cgi/sg7 <https://lists.isocpp.org/mailman/listinfo.cgi/sg7>

Received on 2021-05-18 11:35:57