Date: Fri, 30 May 2025 03:06:49 +0200
>
> It's not about whether the types are efficient. It's about whether an
> implementation can be faster by implementing it out-of-line, with some
> dedicated assembly and possibly CPU-specific dispatching.
Implementing it out-of-line doesn't really get you anything there. It needs
to be inline obviously because these functions should be constexpr. You can
always add an "if !consteval" block and do inline assembly, intrinsics,
etc. in there. These functions are very small though and benefit a lot from
inlining, especially when the divisor is known to the optimizer.
> > > Paralleling the std::div function, should this return div_t? On some
> > > architectures, the division instruction already provides the modulus,
> so
> > > it
> > > would be useful to return both at the same time.
> >
> > To my knowledge, std::div is a historical relic which mostly exists
> because
> > the rounding mode of division was unspecified in C, but was truncating
> for
> > std::div. There's really no point in using it anymore. Every optimizing
> > compiler will fuse separate division and remainder operations into one
> div
> > instruction.
>
> For x86, yes. And that's the only architecture I particularly care about.
>
> But I don't know whether there are faster versions of division+modulus
> than
> what compilers implement for all other architectures, so it is possible
> that
> std::div() is still valuable. For example, for architectures without a
> division instruction and for which the compiler would insert runtime calls
> anyway.
>
Even if the compiler had to implement division in software for some arch,
the same principle of fusing division and remainder into one operation (and
then one function call) should apply.
Don't get me wrong, if someone produced a convincing use case for why these
functions need a combined division/remainder function, maybe I'd include
it. It just seems like an artificial/hypothetical need.
> > It's only superficially true that the reference implementation *always*
> > computes the remainder. Many of the functions only need to check if the
> > remainder is nonzero, which is possibly cheaper. For example, if the
> > optimizer knows that the dividend is lower than the divisor, there will
> > always be a remainder (unless it's zero). Also, the remainder still needs
> > to be adjusted to match the adjustments made to the quotient for the
> > rounding mode. You're not getting the remainder entirely for free.
>
> I'm not saying it would be. I am asking if it is cheaper to get them in a
> fused function compared to two calls to independent functions.
>
My intuition is that these functions only emit a small handful of IR
instructions anyway, and I've checked this for LLVM. Inlining is super
likely for them, and once that happens, separate divisions/remainders get
fused, and common parts of IR get optimized into one big block anyway.
> > Anyhow, I think you're reading too much into std::div. Combined
> > quotient/remainder functions seem unmotivated to me, and they're not
> > entirely trivial to implement if you need high QoI instead of just
> > computing the remainder in terms of the quotient. For a lot of these
> > functions, the remainder isn't particularly interesting anyway, and you
> > could just compute it in terms of the quotient if you needed it.
>
> I can throw the QoI argument back at you: compilers should just use the
> faster
> version if the quotient is unused. They do that for std::div.
>
It's not a QoI issue though if the user is only interested in the quotient
and they have to grab it out of some struct though. That would be an API
ergonomics issue.
> One common case of needing both are time unit conversions: converting from
> a
> seconds counter to hours, minutes and seconds. Or converting to struct
> timespec.
>
Yeah fair point, these things happen. You only commonly need both for
specific rounding modes though. In your time unit example, that would
presumably involve rounding towards zero, and getting the remainder for
that is built into the language. If it used rounding towards negative
infinity, that's also covered with std::rem_divisor_sign.
Once you get into the more exotic modes like std::div_ties_, it's unclear
what you actually need the remainder for, and the adjustments to the result
of (x / y) made are so complex that you're better off computing the
remainder using the quotient afterwards than to make matching adjustments.
> > The implementation cost for anything to do with floating-point is much
> > greater, and so different standards and design practices apply.
>
> That's understood. But the fact is they exist and are understood. Adding
> more
> modes can be confusing. All I'm asking is to limit what isn't in FP to
> what
> you can justify a need for, not just because you had the math.
Yeah fair point, there's a possibility of introducing a dilemma of choice.
I'm not really convinced it would be a big problem though. The user can
just not call std::div_to_odd if they don't need it.
I'm also worried about creating some swiss cheese of a function set where
some rounding modes are there top-level, some only for tie breaking, and
you'll never hear the end of the bikeshedding which ones should be included
or not. It's quite simple to just include all of them since they're all
small and easy to implement.
> It's not about whether the types are efficient. It's about whether an
> implementation can be faster by implementing it out-of-line, with some
> dedicated assembly and possibly CPU-specific dispatching.
Implementing it out-of-line doesn't really get you anything there. It needs
to be inline obviously because these functions should be constexpr. You can
always add an "if !consteval" block and do inline assembly, intrinsics,
etc. in there. These functions are very small though and benefit a lot from
inlining, especially when the divisor is known to the optimizer.
> > > Paralleling the std::div function, should this return div_t? On some
> > > architectures, the division instruction already provides the modulus,
> so
> > > it
> > > would be useful to return both at the same time.
> >
> > To my knowledge, std::div is a historical relic which mostly exists
> because
> > the rounding mode of division was unspecified in C, but was truncating
> for
> > std::div. There's really no point in using it anymore. Every optimizing
> > compiler will fuse separate division and remainder operations into one
> div
> > instruction.
>
> For x86, yes. And that's the only architecture I particularly care about.
>
> But I don't know whether there are faster versions of division+modulus
> than
> what compilers implement for all other architectures, so it is possible
> that
> std::div() is still valuable. For example, for architectures without a
> division instruction and for which the compiler would insert runtime calls
> anyway.
>
Even if the compiler had to implement division in software for some arch,
the same principle of fusing division and remainder into one operation (and
then one function call) should apply.
Don't get me wrong, if someone produced a convincing use case for why these
functions need a combined division/remainder function, maybe I'd include
it. It just seems like an artificial/hypothetical need.
> > It's only superficially true that the reference implementation *always*
> > computes the remainder. Many of the functions only need to check if the
> > remainder is nonzero, which is possibly cheaper. For example, if the
> > optimizer knows that the dividend is lower than the divisor, there will
> > always be a remainder (unless it's zero). Also, the remainder still needs
> > to be adjusted to match the adjustments made to the quotient for the
> > rounding mode. You're not getting the remainder entirely for free.
>
> I'm not saying it would be. I am asking if it is cheaper to get them in a
> fused function compared to two calls to independent functions.
>
My intuition is that these functions only emit a small handful of IR
instructions anyway, and I've checked this for LLVM. Inlining is super
likely for them, and once that happens, separate divisions/remainders get
fused, and common parts of IR get optimized into one big block anyway.
> > Anyhow, I think you're reading too much into std::div. Combined
> > quotient/remainder functions seem unmotivated to me, and they're not
> > entirely trivial to implement if you need high QoI instead of just
> > computing the remainder in terms of the quotient. For a lot of these
> > functions, the remainder isn't particularly interesting anyway, and you
> > could just compute it in terms of the quotient if you needed it.
>
> I can throw the QoI argument back at you: compilers should just use the
> faster
> version if the quotient is unused. They do that for std::div.
>
It's not a QoI issue though if the user is only interested in the quotient
and they have to grab it out of some struct though. That would be an API
ergonomics issue.
> One common case of needing both are time unit conversions: converting from
> a
> seconds counter to hours, minutes and seconds. Or converting to struct
> timespec.
>
Yeah fair point, these things happen. You only commonly need both for
specific rounding modes though. In your time unit example, that would
presumably involve rounding towards zero, and getting the remainder for
that is built into the language. If it used rounding towards negative
infinity, that's also covered with std::rem_divisor_sign.
Once you get into the more exotic modes like std::div_ties_, it's unclear
what you actually need the remainder for, and the adjustments to the result
of (x / y) made are so complex that you're better off computing the
remainder using the quotient afterwards than to make matching adjustments.
> > The implementation cost for anything to do with floating-point is much
> > greater, and so different standards and design practices apply.
>
> That's understood. But the fact is they exist and are understood. Adding
> more
> modes can be confusing. All I'm asking is to limit what isn't in FP to
> what
> you can justify a need for, not just because you had the math.
Yeah fair point, there's a possibility of introducing a dilemma of choice.
I'm not really convinced it would be a big problem though. The user can
just not call std::div_to_odd if they don't need it.
I'm also worried about creating some swiss cheese of a function set where
some rounding modes are there top-level, some only for tie breaking, and
you'll never hear the end of the bikeshedding which ones should be included
or not. It's quite simple to just include all of them since they're all
small and easy to implement.
Received on 2025-05-30 01:06:25
