Date: Wed, 14 Jan 2026 00:10:29 +0100
Again (in other words) a compiler can internally map between _BitInt(m) and int_mod<m> notations. It does not have to treat one as template and the other not, as those are built-in types. A compiler can use any pre-computed optimization or any algorithm to find such. They can consider preferred target CPUs or even detect the CPU at runtime and call the most suitable function.
If you say you can express an efficient algorithm with recursive templates, one can port those for the current _BitInt(m) implementation.
It would still be a huge improvement. But your answers are not clear, if you agree, or specifically see some use in int_mod<m> for some reason, we cannot see now.
Most C++ code using those probably does not care how it is called?
-----Ursprüngliche Nachricht-----
Von:Hans Åberg via Std-Proposals <std-proposals_at_[hidden]>
Gesendet:Di 13.01.2026 22:54
Betreff:Re: [std-proposals] Modular integers
An:std-proposals_at_[hidden];
CC:Hans Åberg <haberg_1_at_[hidden]>;
> On 13 Jan 2026, at 22:30, Thiago Macieira via Std-Proposals <std-proposals_at_[hidden]> wrote:
>
> On Tuesday, 13 January 2026 13:20:54 Pacific Standard Time Hans Åberg via Std-
> Proposals wrote:
>>> Who said that compilers need a handroll sequence of operations for each
>>> type?? If you can create any algorithm using templates then the compiler
>>> can do SAME algorytm in fraction of cost,
>>> order of magnitude faster than any template or constexpr operation as
>>> it will run on raw x86 that will select a sequence of x86 operations.
>>> And first of all it will be blazing fast on `-O0` instead of the slog
>>> of dozens of superficial functions required by template abstraction.
>>
>> There is a big difference between CPUs with the same compiler, Clang. The
>> same 2/5 GHz clock frequency too only about 4 ns per 2-by-1 division on an
>> Apple Silicon, whereas on an Intel older from 2019 some 30–40 ns. The
>> difference seems to be in the pipelining: the instructions must be computed
>> in parallel for this low latency, and the written code must be structured
>> to admit this.
>
> You're not answering the questions.
Oh. To many replies. :-)
> In fact, the answer you gave has absolutely nothing to do with Marcin's
> statement. Marcin said that if the compiler implements support for 256-bit
> integers, then it won't need to expand and instantiate templates, which in -O0
> mode means there are no non-inlined inline functions.
But that is the very point: the templates recurse down half-words until there is an implementation. So in your example, it will just call the 256-bit specialized function that is implemented.
--
Std-Proposals mailing list
Std-Proposals_at_[hidden]
https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals
Received on 2026-01-13 23:26:26
