C++ Logo

std-proposals

Advanced search

[std-proposals] Benchmarking contract evaluation semantics mode "assume" in Eigen.

From: Adrian Johnston <ajohnston4536_at_[hidden]>
Date: Sat, 28 Mar 2026 15:06:15 -0700
Hello,

I would like to provide better benchmarking numbers for the effects of
converting all contracts in P2900 to assume statements. The executive
summary is that I had mixed results with Eigen and could argue this is a
bug in clang's optimizer.

This is not a proposal to modify P2900. Specifically this is listed in
P2900 as something not proposed: *2.3 Features Not Proposed: The ability to
assume that an unchecked contract predicate would evaluate to true and to
allow the compiler to optimize based on that assumption, i.e., the assume
semantic.*

To provide a benchmark, my immediate concern was finding a representative
project. Most well known high-performance projects already use
__builtin_unreachable() (a.k.a. [[assume(false)]]) and it didn't seem too
credible to experiment on something that has already been carefully
instrumented that way.

Meanwhile P2646R0, the paper that successfully lobbied for the inclusion of
[[assume]], used a synthetic benchmark that is not representative of
widespread use across a non-trivial program either.

In the end, I settled on the Eigen math library because it is well known,
has no assume statements and has a benchmark. The results were mixed and
not too encouraging. Eigen may not have been an ideal test subject either
as it has been carefully optimized too. Please consider this a request for
comments, as I would prefer a benchmark that would be easily recognized as
a good example of this technique being applied to a "normal, well written,
performance sensitive application."

These numbers were acquired by converting all asserts to __builtin_assume()
and then using AI to add some more. The command line was: clang++-18 -O3
-march=native. This is AMD with AVX-512, AVX2/FMA.
Largest improvements (assume is faster):
  - bench_gemm float: nearly all cases 1-9% faster, up to -8.53% at 192x192
  - TRMV_float_Lower/512: -8.46%
  - VectorNorm_double/65536: -14.01%
  - VectorMinCoeff_float/4096: -7.59%, /262144: -7.43%

  Largest regressions (assume is slower):
  - BlockRead_float/512/64: +78.29% - extreme outlier, possible cache effect
  - TRSV_float_Lower/512: +18.78%, /128: +17.61% - consistent, likely real
  - TRMV_float_UnitUpper/1024: +18.89%, TRMV_float_Upper/1024: +18.51%
  - TRMV_double_UnitUpper/2048: +15.63%
  - Dot_cfloat/1048576: +15.76%, Dot_double/65536: +15.78%
  - VectorLpNormInf_double/1048576: +20.62% - worst case overall

The assume hints help float GEMM consistently but hurt triangular solvers
(TRSV/TRMV Upper) and some large-vector reductions significantly. The
__builtin_assume hints appear to misguide the compiler's vectorization
decisions for certain upper-triangular traversal patterns and large strided
reductions.

The reality is that Eigen has use cases that are evidence of clang using
optimization hints to deoptimize a program instead. Arguably this is a bug
in clang's optimizer. This means it is advisable to keep an eye on your
generated assembly and timing numbers as you add assume hints. I have seen
some reports from embedded systems programmers saying that
__builtin_unreachable() is extremely important to them for reducing code
size. And the widespread use of __builtin_unreachable() also shows their
importance. However, it is hard to encourage wholesale global use today
based on these numbers. Attached is the full benchmark data.

Again, if you have a suggestion for a benchmark that is not already
optimized with assume semantics, is open source and would be particularly
credible as an example of whole program optimization, let me know.

All the best,
Adrian Johnston

Received on 2026-03-28 22:06:27