Date: Thu, 7 Aug 2025 14:44:59 +0200
I made a low-level multiprecision division function, similar in intent to the proposal https://isocpp.org/files/papers/P3161R4.html:
For an unsigned type Word, dividend a[] of size n, divisor b[] of size b, the quotient is written into q[] and the remainder into a[]:
template<class Word>
inline void div(Word a[], size_t m, const Word b[], size_t n, Word q[])
By this structure, there is no need for internal allocations, which is important for speed. The “const” part is done via virtual shifts, which can be avoided by passing a b[] with the high bit set.
It requires a 2-by-1 word division a1 a0/b0
template<class Word>
std::pair<Word, Word> div(Word a1, Word a0, Word b);
My implementation is currently twice as fast as that of LLVM, which in turn is faster than that of GCC.
Then I have made a series of templates that choose the implementation: double-word if available, otherwise a half-word (which can be overridden by assembler specializations).
In general, the C++ standard might have some higher fixed-size multiprecision types, uint128_t, and higher. The hard part is the implementation of division, but it is possible to do that within the proposal mentioned above.
For an unsigned type Word, dividend a[] of size n, divisor b[] of size b, the quotient is written into q[] and the remainder into a[]:
template<class Word>
inline void div(Word a[], size_t m, const Word b[], size_t n, Word q[])
By this structure, there is no need for internal allocations, which is important for speed. The “const” part is done via virtual shifts, which can be avoided by passing a b[] with the high bit set.
It requires a 2-by-1 word division a1 a0/b0
template<class Word>
std::pair<Word, Word> div(Word a1, Word a0, Word b);
My implementation is currently twice as fast as that of LLVM, which in turn is faster than that of GCC.
Then I have made a series of templates that choose the implementation: double-word if available, otherwise a half-word (which can be overridden by assembler specializations).
In general, the C++ standard might have some higher fixed-size multiprecision types, uint128_t, and higher. The hard part is the implementation of division, but it is possible to do that within the proposal mentioned above.
Received on 2025-08-07 12:45:14