ISOCPP std-proposals List: Re: [std-proposals] Multiprecision division

From: Arthur O'Dwyer <arthur.j.odwyer_at_[hidden]>
Date: Fri, 8 Aug 2025 13:17:33 -0400

On Thu, Aug 7, 2025 at 3:45 PM Oliver Hunt via Std-Proposals <
std-proposals_at_[hidden]> wrote:

> On Aug 7, 2025, at 11:45 AM, Hans Åberg <haberg_1_at_[hidden]> wrote:
>
> On 7 Aug 2025, at 19:53, Oliver Hunt <oliver_at_[hidden]> wrote:
>
> On Aug 7, 2025, at 5:44 AM, Hans Åberg via Std-Proposals <
> std-proposals_at_[hidden]> wrote:
>
> I made a low-level multiprecision division function, similar in intent to
> the proposal https://isocpp.org/files/papers/P3161R4.html:
>
> For an unsigned type Word, dividend a[] of size n, divisor b[] of size b,
> the quotient is written into q[] and the remainder into a[]:
> template<class Word>
> inline void div(Word a[], size_t m, const Word b[], size_t n, Word q[])
> By this structure, there is no need for internal allocations, which is
> important for speed. The “const” part is done via virtual shifts, which can
> be avoided by passing a b[] with the high bit set.
>
>
> Rather than Word a[] these would really need to be at least
> std::span<Word> - we can harden span, but not raw pointers.
>
> A lot of APIs seem to be written generically over iterators, but I find
> those APIs obnoxiously verbose - but also in practice I doubt anyone plans
> to make a multi precision library using std::set<word>s :D
>
>
> One reason I choose a[] is to avoid overhead given the low programming
> level. This excludes iterators, as division goes top down, and reverse
> iterators are offset by one, with lots of hidden additions.
>
> Another is that I could not find a good C++ construct that admits
> switching between dynamic and static allocations. A quick test on std::span
> shows that problem, but it could be that it can be made to work. Otherwise,
> I would prefer to keep size and value together, as in std::span.
>
>
> Adding new pointer based apis just is not something we can reasonably
> accept at this point, and std::span is exactly equivalent to bounded
> pointers - can you point to your code (or on godbolt?) showing the
> difference in codegen?
>
> If there’s a significant difference in codegen that implies an optimizer
> bug rather than an X-is-bad problem.
>

Or an ABI deficiency. We can see this clearly on Windows:
https://godbolt.org/z/oT4zWzThf

One big advantage of providing an API that uses a core-language
pointer-length pair, or a templated API that takes arbitrary iterators, is
that such an ABI has no dependency on the Standard Library. You can use
that ABI without dragging in anything else from the STL at all. This is
(sadly) a design philosophy no longer followed by most of the people on
WG21, but it's a good philosophy in practice — to *keep your dependency
tree shallow*, especially in cases like this one where the API with the
shallow dependency tree is actually *more performant* than the heavyweight
std::span-based one.
However / Additionally — in this case it won't matter what philosophy the
WG21 folks follow, because this code is a perfect candidate for you to *publish
yourself as a single-header library with no dependencies*. Like, just go
ahead and put it on GitHub. Don't gate it on trying to get a formal paper
through WG21 or put it in the Standard Library; just ship it! If it's a
good API, people will use it; if it's got defects, they'll tell you about
them and you can fix them (which you cannot do if it becomes part of the
Standard).

There is no reason at all to put this specific code into the Standard
Library. Hans, publish your implementation and be done.

–Arthur

Received on 2025-08-08 17:17:49