std-discussion: Seeking a better constexpr

From: Jeremy Ong <jeremycong_at_[hidden]>
Date: Sun, 13 Oct 2019 22:57:40 -0600

I wanted to write to seek broader feedback and relay my experience trying
to leverage constexpr in a real-world situation where compile-time
computation is a requirement for performance. By way of a brief background,
I am working on an library that computes expressions over
higher-dimensional data types. The math itself isn't important for the
purposes of this discussion, but compile-time techniques allow me to
"collapse" higher dimensional data into sparse representations without
needing runtime annotation. When working with expressions, authors of ETLs
know that expressions can explode dramatically.

For example, if I encode a polynomial using templates (variables,
monomials, and degrees) via parameter packs as some type P and later try to
do a 4th order taylor expansion (just as an example), the number of terms
will increase as O(N^4). I quickly found that templates were untenable due
to memory and processing constraints (instantiating too many types, etc).
This is *despite* aggressive usage of folds and ensuring that parameter
packs had an imposed total order to minimize instantiations. Under this
approach, is was astonishingly easy to completely cripple the compiler even
for cases I would consider easy (clang profiling was tried, as well as
general profiling tools). Again, this is code that runs perfectly fine at
runtime, but the cost could easily be moved to compile time.

Thus, the next option is to leverage constexpr, encoding expressions as
actual member data, and using templates solely to encode the size of the
data. Folds become loops, and in principle, compilation time and memory
usage should go down. Here are the things I encountered along the way:

1. The compiler is "dumb" in the sense that data that is entirely compile
time deterministic is not usable as such. Going back to the example of
polynomials. Suppose the number of addends in each is N1 and N2. Upon
addition, the sum is *at most *N1 + N2, however, it is possible that some
of the monomials combine. Trying to express this using any form of
constexpr is extremely difficult. Common strategies include traversing the
data twice (once to compute the size, a second time to actually compute
it). In this case, the data itself is not encoded in types, and so this is
not possible (despite, again, all the data being compile-time
deterministic).

2. I have found that constexpr can ironically decrease performance in a
number of situations. Once code was restructured as constexpr, the compiler
often fails to inline code that should be entirely inlineable. Suppose we
want to actually attach data to what is now a compiled expression template.
All early attempts to do so failed because the compiler was unable to infer
that the instantiation of the function could only possibly be invoked in
one place (and thus should be inlined no questions asked). Part of the
issue here is that constexpr as a keyword conveys *very little
information* regarding
what will ultimately happen when invoked. By the time I restructured the
code to peel away the non-constexpr argument that had snuck in, the compile
times became untenable again (once again, more data needed to be pushed to
types) and it was back to the drawing board.

3. The proper usage of `if constexpr` feels a bit lackluster. The operand
must be fully determined to be a constant expression, but this immediately
qualifies "data" that is, again, static at compile time. Constant
expression arguments proposed for C++2b is an option, but it should seem
that an if block gated by the `is_constant_evaluated` should preclude that
(it doesn't). Another issue is that in more complicated functions, it can
be very difficult to reason about the branches when constexpr and
non-constexpr branches must intermingle.

4. Allowing allocations which do not leave the constexpr context via
constexpr new is not actually helpful in several situations I can think of.
At the end of the data, often, the data needs to be variably sized, and in
this scenario, you cannot use auto return type deduction to read off the
size as a constant expression and create the appropriately sized type that
I need.

Are there situations and workarounds to the above? Yes, but only after
extensive rewrite and fighting the compiler every step of the way. Even at
the end, I'm not sure if I will be getting the performance I could have
gotten with just straight C-style procedural code for each operation I need
to support. Granted, I think my application was perhaps a bit more extreme
in its requirements, but I think it was a decent "sniff"-test for what I
see as problems and headaches ahead. As it is, I'm doing things like
rounding up capacities to the nearest size of two (and other such nonsense)
to reduce the combinatorial explosion of type instantiations.

*The idea I would like to float around is the following:*

- Proposal: introduce a new function signature decorator called `consteval`
(in the spirit of the variable consteval decorator) which is an input into
overload resolution.
- All other overload resolution decorators are evaluated first (const, ref
qualifier, ADL), and then `consteval` is considered. If the invocation
occurs at compile time, the `consteval` overload is given precedence.
- All arguments passed to the consteval function are *implicitly constexpr *by
definition
- All if statements and control statements are similarly implicitly
constexpr (even if the notion for such a construct doesn't exist now, i.e.
there is no `for constexpr` as of yet).
- Functions that are `consteval` can call non-consteval functions without
any issue (it is a weaker qualification).
- All allocations are also permitted applying the rules with the currently
proposed `constexpr new`, except *they may leave *the constexpr context and
reside in the data segment (analogous to static variable initialization).
Initially, I would imagine that only trivial destructors are permitted (but
still this would unlock a lot of capability).
- All data mutated in the consteval context are also considered "constant
expressions" when referenced later.
- Ergo, consteval produces an easy-to-reason about "frontier" of constant
expressions. In conjunction with consteval as a variable decorator, there
is no guesswork as to what will happen at compile time and what won't
- We dramatically simplify code and no longer need specific constexpr
variants of various control structures and littered code with conditionals
to activate "compile-time-only" code.
- If we want to share functionality, we can introduce a common entry point
that does not have the `consteval` qualifier.

The theme of the idea is simple. Code that is C++ should just be C++, and
how it behaves at compile time and runtime should differ by at most a
keyword. I think such a feature is far easier to reason about (and to
teach, just piggy-backing off general overload resolution principles),
without overcrowding the API (at our current rate, we'll have a
constexpr-paper variant for every language feature in C++). I would not
propose deprecating any existing functionality immediately, but tbh, if I
had something like `consteval` today, I would outright ban constexpr in my
codebase.

Important: I don't (even remotely) claim that the above has worked out all
the details, or that there aren't issues I haven't thought of. These are
just thoughts that I kept coming back to now that I've had much more
extensive experience working with constexpr (compared to just dotted
isolated cases in the wild before). My standardese is weak, having never
written a paper, so consider this primarily as a perspective from the lens
of a practitioner.

Thoughts/feedback appreciated,
Jeremy

Received on 2019-10-14 00:00:42