Date: Fri, 08 Aug 2025 08:31:03 -0700
On Friday, 8 August 2025 05:31:27 Pacific Daylight Time Hans Åberg via Std-
Proposals wrote:
> There is 9–10% overhead with std::span relative to using clang O3 in some 10
> million random division runs of the function “div32” calling the “div”
> function I mentioned. Somewhat less with g++ O3, but it has somewhat worse
> absolute timings on the “without” variation.
For this mailing list, all of the above is completely irrelevant.
Don't get me wrong: it's important to have performance improvements and we
appreciate you've found an area that could use improvement. Your proposal also
rests on the fact that there is performance gain to be had.
But the numbers and technique are less important.
What you had to do in order to benchmark your implementation isn't relevant to
the discussion of the API. In fact, some vendors may not even be allowed to
look at your code. As a compiler vendor, if they see an optimisation
bottleneck in their own compiler, they can also fix it. They may have
techniques you are not allowed to use.
Your benchmark numbers are also not very important. Technology will keep
progressing quickly, so they will be stale soon. Vendors will also have other
constraints you didn't: they may need to target older generations or other
architectures you haven't addressed. They may also choose to implement the
algorithm directly in assembly for some cases, as in the case of a 128-bit
dividend by a 64-bit divisor on x86-64, which is a single instruction.
In summary, do not design your API based on your benchmarking techniques,
limitations, or results.
Proposals wrote:
> There is 9–10% overhead with std::span relative to using clang O3 in some 10
> million random division runs of the function “div32” calling the “div”
> function I mentioned. Somewhat less with g++ O3, but it has somewhat worse
> absolute timings on the “without” variation.
For this mailing list, all of the above is completely irrelevant.
Don't get me wrong: it's important to have performance improvements and we
appreciate you've found an area that could use improvement. Your proposal also
rests on the fact that there is performance gain to be had.
But the numbers and technique are less important.
What you had to do in order to benchmark your implementation isn't relevant to
the discussion of the API. In fact, some vendors may not even be allowed to
look at your code. As a compiler vendor, if they see an optimisation
bottleneck in their own compiler, they can also fix it. They may have
techniques you are not allowed to use.
Your benchmark numbers are also not very important. Technology will keep
progressing quickly, so they will be stale soon. Vendors will also have other
constraints you didn't: they may need to target older generations or other
architectures you haven't addressed. They may also choose to implement the
algorithm directly in assembly for some cases, as in the case of a 128-bit
dividend by a 64-bit divisor on x86-64, which is a single instruction.
In summary, do not design your API based on your benchmarking techniques,
limitations, or results.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel Platform & System Engineering
Received on 2025-08-08 15:31:09