C++ Logo

std-proposals

Advanced search

Re: [std-proposals] Modular integers

From: Hans Åberg <haberg_1_at_[hidden]>
Date: Tue, 13 Jan 2026 19:12:28 +0100
> On 13 Jan 2026, at 19:03, Jan Schultke <janschultke_at_[hidden]> wrote:
>
> On Tue, Jan 13, 2026, 18:47 Hans Åberg <haberg_1_at_[hidden]> wrote:
>
> > On 13 Jan 2026, at 18:07, Jan Schultke <janschultke_at_[hidden]> wrote:
> >
> >> Compilers already do a reasonably good job at lowering _BitInt(128) and _BitInt(4096) operations, and there is no innate reason why they would do a better job at _BitModInt(4096) or mod_int<4096> or whatever you're proposing. I don't see any scenario in which your proposed mod_int wouldn't just be an alias template for unsigned _BitInt.
> >
> > It is important for pipelining to avoid backward jumps, like in loops. Despite Clang having an efficient 2-word by 1-word 64-bit implementation, its 2-word by 2-word division uses binary division, which is slow. In addition, 2-word by 1-word division can be optimized to only use one multiplication per half-word.
> >
> > That sounds like a missed optimization. You could file a bug report for LLVM or fix the issue yourself in a PR.
>
> I made one for the 2-word by 1-word implementation, as they did not do the division add back stage, so I had to fill in the math for them why it is not needed. This then led to the optimization where to where only one multiplication per half-word is needed.
>
> Seems like a fixable problem, not like an innate limitation of the _BitInt notation.

They had to do nothing, except possibly add a comment as to why it is excluded.

> My 2-word by 2-word implementation is a template, whereas LLVM has an explicit 64-bit.
>
> > Why would you expect an int_mod<128> to use a more efficient division than _BitInt(128)?
>
>> The latter could be specialized to use the former, but I think one will typically see a 32-64 bit implementation, not even a full 64 bit one, with loops.
>>
>> The former can be implemented using special 2-word by 1-word optimization not possible for general multiprecision division.
>>
>> The LLVM 2-word by 2-word 64-bit implementation has a loop, and taking that loop away had a dramatic increase in speed, nearly halving the latency, involving some tricky stuff in the implementation.
>>
> That doesn't answer the question in the slightest. If the standard mandates that an implementation provides both _BitInt and mod_int, why would an implementation choose to implement one with faster decision while leaving the other deliberately slow, or be innately unable to make both fast? If there is no reason, this proposal is obviously pointless.

One way to optimize _BitInt(N) for time might be to find a word of 2^⁽2^k) bits that it fits into, and then use recursive templates halving the words. Then the problem is this requires C++, not available in C. So it goes back to the problem that this is a C type that inherits its limitations. But then, there would be no difference in performance between these types.

Received on 2026-01-13 18:12:45