ISOCPP std-proposals List: Re: [std-proposals] Multiprecision division

From: Howard Hinnant <howard.hinnant_at_[hidden]>
Date: Thu, 7 Aug 2025 12:40:10 -0400

Fwiw, here’s my attempt at multiprecision division:

https://github.com/HowardHinnant/bbi
HowardHinnant/bbi: Better Behaved Integers
github.com
https://github.com/HowardHinnant/bbi/blob/master/bbi.h#L2542-L2637

I include it in this discussion just to widen the conversation, and perhaps give you something to show how much faster your algorithm is compared to this (I have no doubt it is). My implementation is limited to power-of-2 bit widths, and defines division of bit-width N in terms of operations on bit-width N/2. This recursivie formulation also avoids memory allocation.

I know practically nothing about this domain. I got my algorithms from Hacker’s Delight:

https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0321842685#:~:text=In%20Hacker's%20Delight%2C%20Second%20Edition,by%20the%20opportunity%20to%20improve.

For printing I noted that division by 10 was glacial and provided an optimization for this divisor:

https://github.com/HowardHinnant/bbi/blob/master/bbi.h#L2878-L2895
bbi/bbi.h at master · HowardHinnant/bbi
github.com

Howard

On Aug 7, 2025, at 8:44 AM, Hans Åberg via Std-Proposals <std-proposals_at_[hidden]> wrote:
>
> I made a low-level multiprecision division function, similar in intent to the proposal https://isocpp.org/files/papers/P3161R4.html:
>
> For an unsigned type Word, dividend a[] of size n, divisor b[] of size b, the quotient is written into q[] and the remainder into a[]:
> template<class Word>
> inline void div(Word a[], size_t m, const Word b[], size_t n, Word q[])
> By this structure, there is no need for internal allocations, which is important for speed. The “const” part is done via virtual shifts, which can be avoided by passing a b[] with the high bit set.
>
> It requires a 2-by-1 word division a1 a0/b0
> template<class Word>
> std::pair<Word, Word> div(Word a1, Word a0, Word b);
> My implementation is currently twice as fast as that of LLVM, which in turn is faster than that of GCC.
>
> Then I have made a series of templates that choose the implementation: double-word if available, otherwise a half-word (which can be overridden by assembler specializations).
>
> In general, the C++ standard might have some higher fixed-size multiprecision types, uint128_t, and higher. The hard part is the implementation of division, but it is possible to do that within the proposal mentioned above.
>
>
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2025-08-07 16:40:46