Date: Thu, 7 Aug 2025 14:48:44 +0200
Hi Hans,
is the difference in speed to LLVM and GCC due to implementation or due to function interface for common standard types?
-----Ursprüngliche Nachricht-----
Von:Hans Åberg via Std-Proposals <std-proposals_at_[hidden]>
Gesendet:Do 07.08.2025 14:45
Betreff:[std-proposals] Multiprecision division
An:std-proposals_at_[hidden];
CC:Hans Åberg <haberg_1_at_[hidden]>;
I made a low-level multiprecision division function, similar in intent to the proposal https://isocpp.org/files/papers/P3161R4.html:
For an unsigned type Word, dividend a[] of size n, divisor b[] of size b, the quotient is written into q[] and the remainder into a[]:
template<class Word>
inline void div(Word a[], size_t m, const Word b[], size_t n, Word q[])
By this structure, there is no need for internal allocations, which is important for speed. The “const” part is done via virtual shifts, which can be avoided by passing a b[] with the high bit set.
It requires a 2-by-1 word division a1 a0/b0
template<class Word>
std::pair<Word, Word> div(Word a1, Word a0, Word b);
My implementation is currently twice as fast as that of LLVM, which in turn is faster than that of GCC.
Then I have made a series of templates that choose the implementation: double-word if available, otherwise a half-word (which can be overridden by assembler specializations).
In general, the C++ standard might have some higher fixed-size multiprecision types, uint128_t, and higher. The hard part is the implementation of division, but it is possible to do that within the proposal mentioned above.
--
Std-Proposals mailing list
Std-Proposals_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
Received on 2025-08-07 12:59:00