Date: Thu, 29 May 2025 21:45:36 -0300
On Thursday, 29 May 2025 21:28:47 Brasilia Standard Time Jan Schultke wrote:
> > You're proposing an interface with templates. Are you sure they can be
> > implemented efficiently inline for all architectures? Shouldn't they be
> > out-of-
> > line and provide only int, long, and long long support via overloads?
>
> I'm not really sure what you mean by that. There is no guarantee that "long
> long" or "int" are "efficient" either, and the standard generally doesn't
> care about this. The other functions in <numeric> such as saturating
> arithmetic or gcd are also templates, and accept any integer type. Whether
> that is "efficient" is a QoI issue. I'm just following conventions.
It's not about whether the types are efficient. It's about whether an
implementation can be faster by implementing it out-of-line, with some
dedicated assembly and possibly CPU-specific dispatching.
<numeric> isn't the only header whose pattern to follow. std::div is in
<cstdlib>.
> > Paralleling the std::div function, should this return div_t? On some
> > architectures, the division instruction already provides the modulus, so
> > it
> > would be useful to return both at the same time.
>
> To my knowledge, std::div is a historical relic which mostly exists because
> the rounding mode of division was unspecified in C, but was truncating for
> std::div. There's really no point in using it anymore. Every optimizing
> compiler will fuse separate division and remainder operations into one div
> instruction.
For x86, yes. And that's the only architecture I particularly care about.
But I don't know whether there are faster versions of division+modulus than
what compilers implement for all other architectures, so it is possible that
std::div() is still valuable. For example, for architectures without a
division instruction and for which the compiler would insert runtime calls
anyway.
In any case, returning both from the functions you're proposing still makes
sense to help the compiler optimisers. The operations may be just complex
enough that the optimisers would be thrown off and do a worse job than a fused
function when code called for both. Though this is indeed QoI.
> It's only superficially true that the reference implementation *always*
> computes the remainder. Many of the functions only need to check if the
> remainder is nonzero, which is possibly cheaper. For example, if the
> optimizer knows that the dividend is lower than the divisor, there will
> always be a remainder (unless it's zero). Also, the remainder still needs
> to be adjusted to match the adjustments made to the quotient for the
> rounding mode. You're not getting the remainder entirely for free.
I'm not saying it would be. I am asking if it is cheaper to get them in a
fused function compared to two calls to independent functions.
> Anyhow, I think you're reading too much into std::div. Combined
> quotient/remainder functions seem unmotivated to me, and they're not
> entirely trivial to implement if you need high QoI instead of just
> computing the remainder in terms of the quotient. For a lot of these
> functions, the remainder isn't particularly interesting anyway, and you
> could just compute it in terms of the quotient if you needed it.
I can throw the QoI argument back at you: compilers should just use the faster
version if the quotient is unused. They do that for std::div.
One common case of needing both are time unit conversions: converting from a
seconds counter to hours, minutes and seconds. Or converting to struct
timespec.
> The implementation cost for anything to do with floating-point is much
> greater, and so different standards and design practices apply.
That's understood. But the fact is they exist and are understood. Adding more
modes can be confusing. All I'm asking is to limit what isn't in FP to what
you can justify a need for, not just because you had the math.
> > You're proposing an interface with templates. Are you sure they can be
> > implemented efficiently inline for all architectures? Shouldn't they be
> > out-of-
> > line and provide only int, long, and long long support via overloads?
>
> I'm not really sure what you mean by that. There is no guarantee that "long
> long" or "int" are "efficient" either, and the standard generally doesn't
> care about this. The other functions in <numeric> such as saturating
> arithmetic or gcd are also templates, and accept any integer type. Whether
> that is "efficient" is a QoI issue. I'm just following conventions.
It's not about whether the types are efficient. It's about whether an
implementation can be faster by implementing it out-of-line, with some
dedicated assembly and possibly CPU-specific dispatching.
<numeric> isn't the only header whose pattern to follow. std::div is in
<cstdlib>.
> > Paralleling the std::div function, should this return div_t? On some
> > architectures, the division instruction already provides the modulus, so
> > it
> > would be useful to return both at the same time.
>
> To my knowledge, std::div is a historical relic which mostly exists because
> the rounding mode of division was unspecified in C, but was truncating for
> std::div. There's really no point in using it anymore. Every optimizing
> compiler will fuse separate division and remainder operations into one div
> instruction.
For x86, yes. And that's the only architecture I particularly care about.
But I don't know whether there are faster versions of division+modulus than
what compilers implement for all other architectures, so it is possible that
std::div() is still valuable. For example, for architectures without a
division instruction and for which the compiler would insert runtime calls
anyway.
In any case, returning both from the functions you're proposing still makes
sense to help the compiler optimisers. The operations may be just complex
enough that the optimisers would be thrown off and do a worse job than a fused
function when code called for both. Though this is indeed QoI.
> It's only superficially true that the reference implementation *always*
> computes the remainder. Many of the functions only need to check if the
> remainder is nonzero, which is possibly cheaper. For example, if the
> optimizer knows that the dividend is lower than the divisor, there will
> always be a remainder (unless it's zero). Also, the remainder still needs
> to be adjusted to match the adjustments made to the quotient for the
> rounding mode. You're not getting the remainder entirely for free.
I'm not saying it would be. I am asking if it is cheaper to get them in a
fused function compared to two calls to independent functions.
> Anyhow, I think you're reading too much into std::div. Combined
> quotient/remainder functions seem unmotivated to me, and they're not
> entirely trivial to implement if you need high QoI instead of just
> computing the remainder in terms of the quotient. For a lot of these
> functions, the remainder isn't particularly interesting anyway, and you
> could just compute it in terms of the quotient if you needed it.
I can throw the QoI argument back at you: compilers should just use the faster
version if the quotient is unused. They do that for std::div.
One common case of needing both are time unit conversions: converting from a
seconds counter to hours, minutes and seconds. Or converting to struct
timespec.
> The implementation cost for anything to do with floating-point is much
> greater, and so different standards and design practices apply.
That's understood. But the fact is they exist and are understood. Adding more
modes can be confusing. All I'm asking is to limit what isn't in FP to what
you can justify a need for, not just because you had the math.
-- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel Platform & System Engineering
Received on 2025-05-30 00:45:44