ISOCPP std-proposals List: Re: [std-proposals] Multiprecision division

From: Tiago Freire <tmiguelf_at_[hidden]>
Date: Thu, 7 Aug 2025 14:04:46 +0000

Hi Hans,
I'm very interested in looking on how you are using these functions.

I'm currently in the process of trying to rewrite the proposal. Your previous suggestion for a fused multiply add seemed sensible and I'm trying to come up with an implementation for it, before making a decision of either or not to include it.

In the last meeting the paper was criticized for how the signed variants, and I'm trying to decide between:
a) dropping signed variants. Since they are not important for what I want to do and makes the paper much easier to defend.
b) go all in stick to the original design a write a whole lot more of justification to defend it.

And seeing how other people are using these may help me take a decision on direction.

________________________________
From: Hans Åberg <haberg_1_at_[hidden]>
Sent: Thursday, August 7, 2025 2:45:17 PM
To: std-proposals_at_[hidden] <std-proposals_at_[hidden]>
Cc: Tiago Freire <tmiguelf_at_[hidden]>
Subject: Multiprecision division

I made a low-level multiprecision division function, similar in intent to the proposal https://isocpp.org/files/papers/P3161R4.html:

For an unsigned type Word, dividend a[] of size n, divisor b[] of size b, the quotient is written into q[] and the remainder into a[]:
  template<class Word>
  inline void div(Word a[], size_t m, const Word b[], size_t n, Word q[])
By this structure, there is no need for internal allocations, which is important for speed. The “const” part is done via virtual shifts, which can be avoided by passing a b[] with the high bit set.

It requires a 2-by-1 word division a1 a0/b0
  template<class Word>
  std::pair<Word, Word> div(Word a1, Word a0, Word b);
My implementation is currently twice as fast as that of LLVM, which in turn is faster than that of GCC.

Then I have made a series of templates that choose the implementation: double-word if available, otherwise a half-word (which can be overridden by assembler specializations).

In general, the C++ standard might have some higher fixed-size multiprecision types, uint128_t, and higher. The hard part is the implementation of division, but it is possible to do that within the proposal mentioned above.

Received on 2025-08-07 14:04:50