Date: Sun, 18 Feb 2024 17:35:15 +0100

> In this case you are doing modular arithmetic, in that situation your inputs can never be larger than what you can divide.

I think you're right; this didn't occur to me.

> That is where I disagree, and what I see is the exact opposite, people think they want 128bit arithmetic, but what they really want is an easy way to do multi-word arithmetic.

We've gone over this a number of times and honestly, it doesn't really

matter whether one thinks of 128-bit operations as 128-bit or

2x64-bit. It's a philosophical question.

What's much more important is what a language/library feature lets you

express. Basically, the "bang for your buck" in terms of what it costs

to add to the language and what benefit you get from it. I'm starting

to oppose multi-word arithmetic in principle because 128-bit obsoletes

almost all use cases in practice.

- You can implement mul_wide in terms of 128-bit multiplication

between two 64-bit operands.

- You can implement add_wide in terms of 128-bit addition between two

64-bit operands.

- You can implement div_wide in terms of 128-bit division with a 64-bit divisor.

- You can add [[assume]] to match the semantics of your function exactly.

- You can implement rem_wide in terms of 128-bit division.

What's the point of any of these functions then, if the user can

implement them trivially themselves, using a much more ergonomic, more

powerful, and more portable feature?

The codegen for some 128-bit operations is not optimal yet, but this

is 100% a quality of implementation issue. A proposal can't succeed on

QoI issues alone. For example, you could easily contribute a diff to

__umodti3 in GCC and clang which implements 128-bit remainder in terms

of two DIV instructions, like in the trick you've shown.

> So does every single compiler that provide 128bit arithmetic,

> Look here:

> https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/udivmodti4.c#L84

If 128-bit is the use case, then just give me 128-bit integers. It's

completely pointless to go through these weird middle-men like

div_wide.

I'm not saying that this function isn't useful in principle, but it's

not useful in a way that warrants standardization. In the case of

implementing 128-bit arithmetic, just give me 128-bit integers.

128-bit numbers express 128-bit operations a thousand times better

than two 64-bit numbers.

However, more generally, your proposal is about abstracting hardware

operations. The problem here is that there is only a single

architecture which supports this operation in the first place. Every

architecture except x86_64 will have to software-emulate div_wide,

making standardization pointless.

Just write inline assembly. The people who are using _udiv128 and

other flavors of this already are doing so in projects that make heavy

use of x86 intrinsics anway. You won't help these people much by

replacing one out of thousands of intrinsics with a standard function.

> In my opinion this is an extremely basal function. I shouldn't have to write assembly or re-invent the wheel to get it.

You can make the same case for every AVX-512 instruction as well. Why

don't we have a standard library wrapper for all AVX-512 instructions.

All other architectures can just software-emulate the functionality.

Furthermore, I think it's quite a stretch to call div_wide "extremely

basal". The fact that it's literally UB for a large number of inputs

and can only be used in specific cases makes it not so basal in my

opinion.

I think you're right; this didn't occur to me.

> That is where I disagree, and what I see is the exact opposite, people think they want 128bit arithmetic, but what they really want is an easy way to do multi-word arithmetic.

We've gone over this a number of times and honestly, it doesn't really

matter whether one thinks of 128-bit operations as 128-bit or

2x64-bit. It's a philosophical question.

What's much more important is what a language/library feature lets you

express. Basically, the "bang for your buck" in terms of what it costs

to add to the language and what benefit you get from it. I'm starting

to oppose multi-word arithmetic in principle because 128-bit obsoletes

almost all use cases in practice.

- You can implement mul_wide in terms of 128-bit multiplication

between two 64-bit operands.

- You can implement add_wide in terms of 128-bit addition between two

64-bit operands.

- You can implement div_wide in terms of 128-bit division with a 64-bit divisor.

- You can add [[assume]] to match the semantics of your function exactly.

- You can implement rem_wide in terms of 128-bit division.

What's the point of any of these functions then, if the user can

implement them trivially themselves, using a much more ergonomic, more

powerful, and more portable feature?

The codegen for some 128-bit operations is not optimal yet, but this

is 100% a quality of implementation issue. A proposal can't succeed on

QoI issues alone. For example, you could easily contribute a diff to

__umodti3 in GCC and clang which implements 128-bit remainder in terms

of two DIV instructions, like in the trick you've shown.

> So does every single compiler that provide 128bit arithmetic,

> Look here:

> https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/udivmodti4.c#L84

If 128-bit is the use case, then just give me 128-bit integers. It's

completely pointless to go through these weird middle-men like

div_wide.

I'm not saying that this function isn't useful in principle, but it's

not useful in a way that warrants standardization. In the case of

implementing 128-bit arithmetic, just give me 128-bit integers.

128-bit numbers express 128-bit operations a thousand times better

than two 64-bit numbers.

However, more generally, your proposal is about abstracting hardware

operations. The problem here is that there is only a single

architecture which supports this operation in the first place. Every

architecture except x86_64 will have to software-emulate div_wide,

making standardization pointless.

Just write inline assembly. The people who are using _udiv128 and

other flavors of this already are doing so in projects that make heavy

use of x86 intrinsics anway. You won't help these people much by

replacing one out of thousands of intrinsics with a standard function.

> In my opinion this is an extremely basal function. I shouldn't have to write assembly or re-invent the wheel to get it.

You can make the same case for every AVX-512 instruction as well. Why

don't we have a standard library wrapper for all AVX-512 instructions.

All other architectures can just software-emulate the functionality.

Furthermore, I think it's quite a stretch to call div_wide "extremely

basal". The fact that it's literally UB for a large number of inputs

and can only be used in specific cases makes it not so basal in my

opinion.

Received on 2024-02-18 16:35:27