Date: Sun, 18 Feb 2024 20:00:28 +0100
> Aren't you just describing multiprecision integer math, as seen in e.g.
Essentially yeah, but neither the Tiago's proposed xxx_wide functions,
nor 128-bit integers give you full multi-precision, only the
fundamental operation to build it yourself.
For context, I have written a 128-bit proposal
(https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3140r0.html)
and I'm a proponent of doing multi-precision using 128-bit operations.
> I don't know how Abseil, Boost.Multiprecision, or Clang's _BitInt stack up these days
>From what I've seen, the codegen that Clang has for _BitInt and
__int128 is very solid. There are just a few missed optimizations here
and there. I hope it will improve further over time. For example:
struct add_result {
unsigned long long sum;
bool carry;
};
add_result add_wide_1(unsigned long long x, unsigned long long y) {
auto r = (unsigned __int128) x + y;
return add_result{static_cast<unsigned long long>(r), bool(r >> 64)};
}
add_result add_wide_2(unsigned long long x, unsigned long long y) {
unsigned long long r;
bool carry = __builtin_add_overflow(x, y, &r);
return add_result{r, carry};
}
These are two competing implementations of an add_wide function. Clang
is able to produce the same assembly for both of these:
https://godbolt.org/z/bcx8qoqnn
add_wide(unsigned long long, unsigned long long):
mov rax, rdi
add rax, rsi
setb dl
ret
GCC has slightly worse codegen for both, but to be fair, I haven't
tried massaging this code to squeeze out the perfect assembly.
Essentially yeah, but neither the Tiago's proposed xxx_wide functions,
nor 128-bit integers give you full multi-precision, only the
fundamental operation to build it yourself.
For context, I have written a 128-bit proposal
(https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3140r0.html)
and I'm a proponent of doing multi-precision using 128-bit operations.
> I don't know how Abseil, Boost.Multiprecision, or Clang's _BitInt stack up these days
>From what I've seen, the codegen that Clang has for _BitInt and
__int128 is very solid. There are just a few missed optimizations here
and there. I hope it will improve further over time. For example:
struct add_result {
unsigned long long sum;
bool carry;
};
add_result add_wide_1(unsigned long long x, unsigned long long y) {
auto r = (unsigned __int128) x + y;
return add_result{static_cast<unsigned long long>(r), bool(r >> 64)};
}
add_result add_wide_2(unsigned long long x, unsigned long long y) {
unsigned long long r;
bool carry = __builtin_add_overflow(x, y, &r);
return add_result{r, carry};
}
These are two competing implementations of an add_wide function. Clang
is able to produce the same assembly for both of these:
https://godbolt.org/z/bcx8qoqnn
add_wide(unsigned long long, unsigned long long):
mov rax, rdi
add rax, rsi
setb dl
ret
GCC has slightly worse codegen for both, but to be fair, I haven't
tried massaging this code to squeeze out the perfect assembly.
Received on 2024-02-18 19:00:40