ISOCPP std-proposals List: Re: [std-proposals] Integer overflow arithmetic

From: Jan Schultke <janschultke_at_[hidden]>
Date: Sat, 17 Feb 2024 23:16:15 +0100

> The functions that I have suggested (except for checks), even if not in all CPUs, there is a CPU with such an instruction. Functions like what you are suggesting would always need to be written as a function.

I'm sceptical of that. I have checked the hardware support for
128-to-64-bit division and to my knowledge, ARM and RISC-V don't
usually have support for this operation. x86_64 is an outlier with its
DIV instruction.

> I really would like to avoid adding more content that I would have to defend in front of the committee if I personally don't understand its use case.

That's understandable. It would be great if you could look into this
issue more though.

> Maybe if you can give me an example (github page?) of an algorithm of where that function would be used I can consider.

You need this function for modular arithmetic in general. I guess
here's one example https://stackoverflow.com/q/2207006
Many people naively use the % operator, but this produces incorrect
results when you don't have bits to spare. For example, (80 * 4) mod
100 is 20, but when computed using 8-bit unsigned integers, you end up
computing (80 * 4) mod 256 mod 100, which is not correct.

> But given that the implementation looks like this https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/udivmodti4.c, and not 2 lines of assembly + move, it doesn't look promising

You seem to have some huge misconceptions here. The compiler
absolutely sees that one of the operands in a 128/64 division is only
64-bit and could emit the corresponding assembly. Vendors choose not
to do that and redirect all double-wide division to function calls of
e.g. __umodti3 which seems to forward to __udivmodti4.

The codegen you'll get for your proposed div_wide function is not
going to be any better. Both GCC and clang follow the same strategy
here; neither uses DIV/IDIV directly. Notice that when the divisor has
no high bits, __udivmodti4 redirects to udiv128by64to64 which is using
DIV/IDIV through inline assembly. You should read up on what GCC and
Clang actually do here and get in touch with implementers before
making more bold claims about codegen.

To be fair, there's a little bit of wasted performance by redirecting
through two functions, but the cost of division might still justify
doing that. There's perhaps a bit of work needed here in the event
that the compiler can guarantee that DIV won't raise an exception.
However, you typically don't know that anyway when doing modular
arithmetic. You know for sure that the remainder is representable, but
you can only use DIV when the quotient is representable.

To be honest, I'm starting to be sceptical whether div_wide should be
standardized in that form at all. It's basically standardizing a
quirky x64-specific instruction with no counterpart in other
architectures. rem_wide at least guarantees you a result, whether that
is through x64 DIV or other means.

I challenge you to come up with a use case where the user actually
wants the behavior that you've proposed for div_wide, which is that
you cannot do the operation without a prior check. Personally, I find
it hard to imagine such a case. People usually just want to perform
division, not perform division if they got lucky with their divisor.

Received on 2024-02-17 22:16:28