ISOCPP std-proposals List: Re: [std-proposals] Modular integers

From: (wrong string) Åberg <haberg_1_at_[hidden]>
Date: Tue, 13 Jan 2026 18:47:00 +0100

> On 13 Jan 2026, at 18:07, Jan Schultke <janschultke_at_[hidden]> wrote:
>
>> Compilers already do a reasonably good job at lowering _BitInt(128) and _BitInt(4096) operations, and there is no innate reason why they would do a better job at _BitModInt(4096) or mod_int<4096> or whatever you're proposing. I don't see any scenario in which your proposed mod_int wouldn't just be an alias template for unsigned _BitInt.
>
> It is important for pipelining to avoid backward jumps, like in loops. Despite Clang having an efficient 2-word by 1-word 64-bit implementation, its 2-word by 2-word division uses binary division, which is slow. In addition, 2-word by 1-word division can be optimized to only use one multiplication per half-word.
>
> That sounds like a missed optimization. You could file a bug report for LLVM or fix the issue yourself in a PR.

I made one for the 2-word by 1-word implementation, as they did not do the division add back stage, so I had to fill in the math for them why it is not needed. This then led to the optimization where to where only one multiplication per half-word is needed.

My 2-word by 2-word implementation is a template, whereas LLVM has an explicit 64-bit.

> Why would you expect an int_mod<128> to use a more efficient division than _BitInt(128)?

The latter could be specialized to use the former, but I think one will typically see a 32-64 bit implementation, not even a full 64 bit one, with loops.

The former can be implemented using special 2-word by 1-word optimization not possible for general multiprecision division.

The LLVM 2-word by 2-word 64-bit implementation has a loop, and taking that loop away had a dramatic increase in speed, nearly halving the latency, involving some tricky stuff in the implementation.

Received on 2026-01-13 17:47:15