Date: Sun, 18 Feb 2024 19:32:54 +0100

> Ok, here's a challenge. Try implementing the standard set of 512bit operations in terms of 128bit casts, and let's see how fun that is.

I don't see how this is relevant, but it's not any more difficult than

doing it with widening functions.

Say you have implemented 512-bit multiplication in terms of mul_wide

between 64-bit integers. Now simply replace every occurrence of

mul_wide with a 128-bit multiplication. 64-bit mul_wide is

functionally equivalent to 128-bit multiplication with 64-bit

operands.

With these two options, which one is better? Well, I'd argue that the

128-bit version is better because it would be faster in constant

evaluations and debug builds since no function call is necessary. It

also requires no include other than #include <cstdint> and even that

is unnecessary if you go straight for __int128. There's also no chance

of confusing the low and high bits of the mul_wide results because

this is totally unambiguous for a 128-bit integer.

All in all, it seems like 128-bit integers are vastly superior for

EXPRESSING these widening multiplications. It should also be noted

that the codegen is optimal.

> unsigned __int128 mul(unsigned long long x, unsigned long long y) {

> return (unsigned __int128) x * y;

> }

Clang emits:

> mul(unsigned long long, unsigned long long): # @mul(unsigned long long, unsigned long long)

> mov rdx, rsi

> mulx rdx, rax, rdi

> ret

So 128-bit multiplication should be just as good for IMPLEMENTING

widening multiplication as well.

If 128-bit integers are superior for EXPRESSING multiplication and at

least as good for IMPLEMENTING it, what merit does a widening

multiplication function have? You'll need to find some metric by which

mul_wide is superior to such an extent that the committee will

consider including it in the standard.

I don't see how this is relevant, but it's not any more difficult than

doing it with widening functions.

Say you have implemented 512-bit multiplication in terms of mul_wide

between 64-bit integers. Now simply replace every occurrence of

mul_wide with a 128-bit multiplication. 64-bit mul_wide is

functionally equivalent to 128-bit multiplication with 64-bit

operands.

With these two options, which one is better? Well, I'd argue that the

128-bit version is better because it would be faster in constant

evaluations and debug builds since no function call is necessary. It

also requires no include other than #include <cstdint> and even that

is unnecessary if you go straight for __int128. There's also no chance

of confusing the low and high bits of the mul_wide results because

this is totally unambiguous for a 128-bit integer.

All in all, it seems like 128-bit integers are vastly superior for

EXPRESSING these widening multiplications. It should also be noted

that the codegen is optimal.

> unsigned __int128 mul(unsigned long long x, unsigned long long y) {

> return (unsigned __int128) x * y;

> }

Clang emits:

> mul(unsigned long long, unsigned long long): # @mul(unsigned long long, unsigned long long)

> mov rdx, rsi

> mulx rdx, rax, rdi

> ret

So 128-bit multiplication should be just as good for IMPLEMENTING

widening multiplication as well.

If 128-bit integers are superior for EXPRESSING multiplication and at

least as good for IMPLEMENTING it, what merit does a widening

multiplication function have? You'll need to find some metric by which

mul_wide is superior to such an extent that the committee will

consider including it in the standard.

Received on 2024-02-18 18:33:06