Date: Thu, 7 Aug 2025 14:04:46 +0000
Hi Hans,
I'm very interested in looking on how you are using these functions.
I'm currently in the process of trying to rewrite the proposal. Your previous suggestion for a fused multiply add seemed sensible and I'm trying to come up with an implementation for it, before making a decision of either or not to include it.
In the last meeting the paper was criticized for how the signed variants, and I'm trying to decide between:
a) dropping signed variants. Since they are not important for what I want to do and makes the paper much easier to defend.
b) go all in stick to the original design a write a whole lot more of justification to defend it.
And seeing how other people are using these may help me take a decision on direction.
________________________________
From: Hans Åberg <haberg_1_at_[hidden]>
Sent: Thursday, August 7, 2025 2:45:17 PM
To: std-proposals_at_[hidden] <std-proposals_at_[hidden]>
Cc: Tiago Freire <tmiguelf_at_[hidden]>
Subject: Multiprecision division
I made a low-level multiprecision division function, similar in intent to the proposal https://isocpp.org/files/papers/P3161R4.html:
For an unsigned type Word, dividend a[] of size n, divisor b[] of size b, the quotient is written into q[] and the remainder into a[]:
template<class Word>
inline void div(Word a[], size_t m, const Word b[], size_t n, Word q[])
By this structure, there is no need for internal allocations, which is important for speed. The “const” part is done via virtual shifts, which can be avoided by passing a b[] with the high bit set.
It requires a 2-by-1 word division a1 a0/b0
template<class Word>
std::pair<Word, Word> div(Word a1, Word a0, Word b);
My implementation is currently twice as fast as that of LLVM, which in turn is faster than that of GCC.
Then I have made a series of templates that choose the implementation: double-word if available, otherwise a half-word (which can be overridden by assembler specializations).
In general, the C++ standard might have some higher fixed-size multiprecision types, uint128_t, and higher. The hard part is the implementation of division, but it is possible to do that within the proposal mentioned above.
I'm very interested in looking on how you are using these functions.
I'm currently in the process of trying to rewrite the proposal. Your previous suggestion for a fused multiply add seemed sensible and I'm trying to come up with an implementation for it, before making a decision of either or not to include it.
In the last meeting the paper was criticized for how the signed variants, and I'm trying to decide between:
a) dropping signed variants. Since they are not important for what I want to do and makes the paper much easier to defend.
b) go all in stick to the original design a write a whole lot more of justification to defend it.
And seeing how other people are using these may help me take a decision on direction.
________________________________
From: Hans Åberg <haberg_1_at_[hidden]>
Sent: Thursday, August 7, 2025 2:45:17 PM
To: std-proposals_at_[hidden] <std-proposals_at_[hidden]>
Cc: Tiago Freire <tmiguelf_at_[hidden]>
Subject: Multiprecision division
I made a low-level multiprecision division function, similar in intent to the proposal https://isocpp.org/files/papers/P3161R4.html:
For an unsigned type Word, dividend a[] of size n, divisor b[] of size b, the quotient is written into q[] and the remainder into a[]:
template<class Word>
inline void div(Word a[], size_t m, const Word b[], size_t n, Word q[])
By this structure, there is no need for internal allocations, which is important for speed. The “const” part is done via virtual shifts, which can be avoided by passing a b[] with the high bit set.
It requires a 2-by-1 word division a1 a0/b0
template<class Word>
std::pair<Word, Word> div(Word a1, Word a0, Word b);
My implementation is currently twice as fast as that of LLVM, which in turn is faster than that of GCC.
Then I have made a series of templates that choose the implementation: double-word if available, otherwise a half-word (which can be overridden by assembler specializations).
In general, the C++ standard might have some higher fixed-size multiprecision types, uint128_t, and higher. The hard part is the implementation of division, but it is possible to do that within the proposal mentioned above.
Received on 2025-08-07 14:04:50