Date: Wed, 14 Jan 2026 10:29:48 +0000
On 14 January 2026 10:04:23 GMT, "Hans Åberg via Std-Proposals" <std-proposals_at_[hidden]> wrote:
>
>> On 14 Jan 2026, at 10:43, Jan Schultke <janschultke_at_[hidden]> wrote:
>>
>> Yes, that is how I discovered it was much faster. It is a template, essentially an implementation of div_wide in this proposal:
>> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3161r4.html#functions.div_wide
>>
>> LLVM implements it with a loop. I made recursive templates, removing the loop and all multiplications except one per halfword.
>>
>> Tiago Freire has a copy which might be used for a reference implementation, in case you would like to add requirements that compilers have it, not merely being optional.
>>
>> If you have an optimization opportunity that LLVM does not take, why don't you make an LLVM PR or bug report instead of a C++ proposal?
>
>The optimizations are not a part of a C++ proposal, clearly, since the standard does not make such statements. There is usually a requirement of at least one fairly efficient implementation, leaving it open for improvements, and that is what I indicated.
>
>The LLVM code is in C, so templates won't help there. They seem happy with what they have.
LLVM is written C++, not that it matters. It could be written in Python and still emit efficient assembly.
>
>>> And if so, why can't the compiler do
>>> the same? What is the reason it cannot do the same?
>>
>> Why don't you ask the compiler or compiler writer? :-)
>>
>> Why don't we ask you? You're the one arguing there needs to be a mod_int C++ feature, so you're the one who needs to motivate it.
>>
>> In order to remove the loop, one has to restructure the condition. For a full word implementation, one has to use two's complement features. Then in addition, using the mathematical facts that preliminary division overshoots with at most 2, that the add back step is not needed, and can further be exploited to eliminate a final multiplication.
>>
>> And why is it innately impossible for LLVM to perform this two's-complement-based and mathematical optimization for unsigned _BitInt(128), unlike for mod_int<128>?
>
>Try to make LLVM transform this traditional 32-64 bit (32-bit word) implementation, using signed integers here below, into a full word 64-bit implementation, using only unsigned types.
>https://raw.githubusercontent.com/hcs0/Hackers-Delight/master/divmnu64.c.txt
>
>
>> On 14 Jan 2026, at 10:43, Jan Schultke <janschultke_at_[hidden]> wrote:
>>
>> Yes, that is how I discovered it was much faster. It is a template, essentially an implementation of div_wide in this proposal:
>> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3161r4.html#functions.div_wide
>>
>> LLVM implements it with a loop. I made recursive templates, removing the loop and all multiplications except one per halfword.
>>
>> Tiago Freire has a copy which might be used for a reference implementation, in case you would like to add requirements that compilers have it, not merely being optional.
>>
>> If you have an optimization opportunity that LLVM does not take, why don't you make an LLVM PR or bug report instead of a C++ proposal?
>
>The optimizations are not a part of a C++ proposal, clearly, since the standard does not make such statements. There is usually a requirement of at least one fairly efficient implementation, leaving it open for improvements, and that is what I indicated.
>
>The LLVM code is in C, so templates won't help there. They seem happy with what they have.
LLVM is written C++, not that it matters. It could be written in Python and still emit efficient assembly.
>
>>> And if so, why can't the compiler do
>>> the same? What is the reason it cannot do the same?
>>
>> Why don't you ask the compiler or compiler writer? :-)
>>
>> Why don't we ask you? You're the one arguing there needs to be a mod_int C++ feature, so you're the one who needs to motivate it.
>>
>> In order to remove the loop, one has to restructure the condition. For a full word implementation, one has to use two's complement features. Then in addition, using the mathematical facts that preliminary division overshoots with at most 2, that the add back step is not needed, and can further be exploited to eliminate a final multiplication.
>>
>> And why is it innately impossible for LLVM to perform this two's-complement-based and mathematical optimization for unsigned _BitInt(128), unlike for mod_int<128>?
>
>Try to make LLVM transform this traditional 32-64 bit (32-bit word) implementation, using signed integers here below, into a full word 64-bit implementation, using only unsigned types.
>https://raw.githubusercontent.com/hcs0/Hackers-Delight/master/divmnu64.c.txt
>
Received on 2026-01-14 10:29:55
