Date: Thu, 12 Jun 2025 14:17:15 +0200
> On 12 Jun 2025, at 13:29, Tiago Freire <tmiguelf_at_[hidden]> wrote:
>
> The suggestion that div_wide should have 2 words for the result (+ remainder) instead of the current 1 because of the potential overflow, is something that has come up before.
Only the overflow is needed in multiprecision division, not the upper word. If 𝛽 is the base, the overflow is at most 𝛽 + 2, so the upper word is never greater than 1. It is necessary to detect the overflow, but the quotient approximation can immediately be set to 𝛽 - 1, which is -1 in the unsigned type.
> 2. CPU's don't have such an instruction.
So the question is whether CPUs can detect overflow, and then report it as a boolean. I have written:
inline std::tuple<uint32_t, uint32_t, bool> div(uint32_t a0, uint32_t a1, uint32_t b)
{
uint64_t a = ((uint64_t)(a1) << 32) | (uint64_t)a0;
uint64_t q = a/b;
uint64_t r = a - q*b;
return {q, r, q >> 32};
}
And similar for uint8_t, uint16_t.
There remains the case of uint64_t, which is really what the question is about.
>
> The suggestion that div_wide should have 2 words for the result (+ remainder) instead of the current 1 because of the potential overflow, is something that has come up before.
Only the overflow is needed in multiprecision division, not the upper word. If 𝛽 is the base, the overflow is at most 𝛽 + 2, so the upper word is never greater than 1. It is necessary to detect the overflow, but the quotient approximation can immediately be set to 𝛽 - 1, which is -1 in the unsigned type.
> 2. CPU's don't have such an instruction.
So the question is whether CPUs can detect overflow, and then report it as a boolean. I have written:
inline std::tuple<uint32_t, uint32_t, bool> div(uint32_t a0, uint32_t a1, uint32_t b)
{
uint64_t a = ((uint64_t)(a1) << 32) | (uint64_t)a0;
uint64_t q = a/b;
uint64_t r = a - q*b;
return {q, r, q >> 32};
}
And similar for uint8_t, uint16_t.
There remains the case of uint64_t, which is really what the question is about.
Received on 2025-06-12 12:17:51