Date: Tue, 13 Jan 2026 15:58:59 +0100
> On 13 Jan 2026, at 15:47, Alejandro Colomar <une+cxx_std-proposals_at_[hidden]> wrote:
>
> On Tue, Jan 13, 2026 at 03:36:37PM +0100, Hans Åberg via Std-Proposals wrote:
>> It is probably too complicated to do by hand. By contrast, recursive templates can halve the words until there is hardware support, the latter of which can be implemented by specializing an inline function.
>
> Why would templates be able to be faster than _BitInt()? Is there any
> inherent issue in _BitInt() that doesn't allow compilers to implement it
> efficiently?
The templates are recursive, so they halve the words until one can replace it with hardware-supported assembler. The template functions have no backward jumps, encouraging pipelining, which is very complicated to write by hand if one were to iterate it.
Say it is on a 64-bit platform. For division, if there is divq, one can use it; if not, it will recurse down to a 32-bit implementation. If one is on a larger word size, one can use it by specializing an inline function.
>
> On Tue, Jan 13, 2026 at 03:36:37PM +0100, Hans Åberg via Std-Proposals wrote:
>> It is probably too complicated to do by hand. By contrast, recursive templates can halve the words until there is hardware support, the latter of which can be implemented by specializing an inline function.
>
> Why would templates be able to be faster than _BitInt()? Is there any
> inherent issue in _BitInt() that doesn't allow compilers to implement it
> efficiently?
The templates are recursive, so they halve the words until one can replace it with hardware-supported assembler. The template functions have no backward jumps, encouraging pipelining, which is very complicated to write by hand if one were to iterate it.
Say it is on a 64-bit platform. For division, if there is divq, one can use it; if not, it will recurse down to a 32-bit implementation. If one is on a larger word size, one can use it by specializing an inline function.
Received on 2026-01-13 14:59:16
