I wanted to write to seek broader feedback and relay my experience trying to leverage constexpr in a real-world situation where compile-time computation is a requirement for performance. By way of a brief background, I am working on an library that computes expressions over higher-dimensional data types. The math itself isn't important for the purposes of this discussion, but compile-time techniques allow me to "collapse" higher dimensional data into sparse representations without needing runtime annotation. When working with expressions, authors of ETLs know that expressions can explode dramatically.
For example, if I encode a polynomial using templates (variables, monomials, and degrees) via parameter packs as some type P and later try to do a 4th order taylor expansion (just as an example), the number of terms will increase as O(N^4). I quickly found that templates were untenable due to memory and processing constraints (instantiating too many types, etc). This is *despite* aggressive usage of folds and ensuring that parameter packs had an imposed total order to minimize instantiations. Under this approach, is was astonishingly easy to completely cripple the compiler even for cases I would consider easy (clang profiling was tried, as well as general profiling tools). Again, this is code that runs perfectly fine at runtime, but the cost could easily be moved to compile time.
Thus, the next option is to leverage constexpr, encoding expressions as actual member data, and using templates solely to encode the size of the data. Folds become loops, and in principle, compilation time and memory usage should go down. Here are the things I encountered along the way:
1. The compiler is "dumb" in the sense that data that is entirely compile time deterministic is not usable as such. Going back to the example of polynomials. Suppose the number of addends in each is N1 and N2. Upon addition, the sum is at most N1 + N2, however, it is possible that some of the monomials combine. Trying to express this using any form of constexpr is extremely difficult. Common strategies include traversing the data twice (once to compute the size, a second time to actually compute it). In this case, the data itself is not encoded in types, and so this is not possible (despite, again, all the data being compile-time deterministic).
2. I have found that constexpr can ironically decrease performance in a number of situations. Once code was restructured as constexpr, the compiler often fails to inline code that should be entirely inlineable. Suppose we want to actually attach data to what is now a compiled expression template. All early attempts to do so failed because the compiler was unable to infer that the instantiation of the function could only possibly be invoked in one place (and thus should be inlined no questions asked). Part of the issue here is that constexpr as a keyword conveys very little information regarding what will ultimately happen when invoked. By the time I restructured the code to peel away the non-constexpr argument that had snuck in, the compile times became untenable again (once again, more data needed to be pushed to types) and it was back to the drawing board.
3. The proper usage of `if constexpr` feels a bit lackluster. The operand must be fully determined to be a constant expression, but this immediately qualifies "data" that is, again, static at compile time. Constant expression arguments proposed for C++2b is an option, but it should seem that an if block gated by the `is_constant_evaluated` should preclude that (it doesn't). Another issue is that in more complicated functions, it can be very difficult to reason about the branches when constexpr and non-constexpr branches must intermingle.
4. Allowing allocations which do not leave the constexpr context via constexpr new is not actually helpful in several situations I can think of. At the end of the data, often, the data needs to be variably sized, and in this scenario, you cannot use auto return type deduction to read off the size as a constant expression and create the appropriately sized type that I need.
Are there situations and workarounds to the above? Yes, but only after extensive rewrite and fighting the compiler every step of the way. Even at the end, I'm not sure if I will be getting the performance I could have gotten with just straight C-style procedural code for each operation I need to support. Granted, I think my application was perhaps a bit more extreme in its requirements, but I think it was a decent "sniff"-test for what I see as problems and headaches ahead. As it is, I'm doing things like rounding up capacities to the nearest size of two (and other such nonsense) to reduce the combinatorial explosion of type instantiations.
The idea I would like to float around is the following:
- Proposal: introduce a new function signature decorator called `consteval` (in the spirit of the variable consteval decorator) which is an input into overload resolution.
- All other overload resolution decorators are evaluated first (const, ref qualifier, ADL), and then `consteval` is considered. If the invocation occurs at compile time, the `consteval` overload is given precedence.
- All arguments passed to the consteval function are implicitly constexpr by definition
- All if statements and control statements are similarly implicitly constexpr (even if the notion for such a construct doesn't exist now, i.e. there is no `for constexpr` as of yet).
- Functions that are `consteval` can call non-consteval functions without any issue (it is a weaker qualification).
- All allocations are also permitted applying the rules with the currently proposed `constexpr new`, except they may leave the constexpr context and reside in the data segment (analogous to static variable initialization). Initially, I would imagine that only trivial destructors are permitted (but still this would unlock a lot of capability).
- All data mutated in the consteval context are also considered "constant expressions" when referenced later.
- Ergo, consteval produces an easy-to-reason about "frontier" of constant expressions. In conjunction with consteval as a variable decorator, there is no guesswork as to what will happen at compile time and what won't
- We dramatically simplify code and no longer need specific constexpr variants of various control structures and littered code with conditionals to activate "compile-time-only" code.
- If we want to share functionality, we can introduce a common entry point that does not have the `consteval` qualifier.
The theme of the idea is simple. Code that is C++ should just be C++, and how it behaves at compile time and runtime should differ by at most a keyword. I think such a feature is far easier to reason about (and to teach, just piggy-backing off general overload resolution principles), without overcrowding the API (at our current rate, we'll have a constexpr-paper variant for every language feature in C++). I would not propose deprecating any existing functionality immediately, but tbh, if I had something like `consteval` today, I would outright ban constexpr in my codebase.
Important: I don't (even remotely) claim that the above has worked out all the details, or that there aren't issues I haven't thought of. These are just thoughts that I kept coming back to now that I've had much more extensive experience working with constexpr (compared to just dotted isolated cases in the wild before). My standardese is weak, having never written a paper, so consider this primarily as a perspective from the lens of a practitioner.