C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Multiprecision division

From: Thiago Macieira <thiago_at_[hidden]>
Date: Thu, 07 Aug 2025 13:04:50 -0700
On Thursday, 7 August 2025 06:43:30 Pacific Daylight Time Hans Åberg via Std-
Proposals wrote:
> Intel Coffee Lake 2.4–4.1 GHz
> div divq
> clang O3 28 ns 25 ns

That's a 10-year-old architecture. The latency for this instruction is up to
73 cycles on CFL: https://uops.info/html-instr/DIV_R64.html#CFL
25 ns is either 73 cycles at 2.92 GHz or 60 cycles at 2.4 GHz or anything in-
between.

If you're going to micro-benchmark, I suggest measuring something that doesn't
change with frequency, like the cycle count or the number of instructions
retired (or both).

Sunny Cove improves that to 18 cycles:
https://uops.info/html-instr/DIV_R64.html#ICL
Similar on AMD Zen 3 and 4:
https://uops.info/html-instr/DIV_R64.html#ZEN4

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Principal Engineer - Intel Platform & System Engineering

Received on 2025-08-07 20:04:52