Date: Mon, 16 Feb 2026 09:46:09 -0500
On 2026-02-15 15:14, andrew drakeford via SG14 wrote:
> Hi all,
> Following the recent SG14 discussion on deterministic parallel
> reduction, I’ve now written up the proposal as a formal WG21 draft.
> The paper focuses on *expression-structure determinism* for parallel
> reductions: it defines a canonical pairwise reduction tree
> parameterised by a lane count /L/. Implementations (threads, SIMD,
> GPU) must produce results as-if evaluating that fixed tree.
> This extends the determinism guarantee of |std::accumulate| (fixed
> evaluation structure) into parallel and vectorised contexts, without
> imposing associativity or commutativity requirements.
> Draft paper (HTML):
> https://andyd123.github.io/canonical-reduce/generated/DxxxxR0.html
> <https://andyd123.github.io/canonical-reduce/generated/DxxxxR0.html>
> As discussed, the reduction kernel follows the “iterated pairwise
> summation” approach described in:
> Dalton, Wang, Blainey — /“Fast, Accurate Summation of Floating-Point
> Numbers”/ (IBM Research)
> https://research.ibm.com/publications/fast-accurate-summation-of-floating-point-numbers
> I would particularly welcome feedback from SG14 on:
>
> *
> Whether the semantic contract is clear and appropriately scoped
> *
> Any concerns regarding interaction with existing parallel algorithms
> *
> Whether the lane-parameterised formulation is the right
> abstraction boundary
>
> If there are no major objections, I intend to move this toward LEWG
> for initial direction.
Would it be worthwhile to float it past SG1 for feedback as well? They
might just rubber-stamp it, but it would probably be a good idea to at
least ensure they're aware of it.
> Best regards,
> Andrew
>
>
> _______________________________________________
> SG14 mailing list
> SG14_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg14
> Hi all,
> Following the recent SG14 discussion on deterministic parallel
> reduction, I’ve now written up the proposal as a formal WG21 draft.
> The paper focuses on *expression-structure determinism* for parallel
> reductions: it defines a canonical pairwise reduction tree
> parameterised by a lane count /L/. Implementations (threads, SIMD,
> GPU) must produce results as-if evaluating that fixed tree.
> This extends the determinism guarantee of |std::accumulate| (fixed
> evaluation structure) into parallel and vectorised contexts, without
> imposing associativity or commutativity requirements.
> Draft paper (HTML):
> https://andyd123.github.io/canonical-reduce/generated/DxxxxR0.html
> <https://andyd123.github.io/canonical-reduce/generated/DxxxxR0.html>
> As discussed, the reduction kernel follows the “iterated pairwise
> summation” approach described in:
> Dalton, Wang, Blainey — /“Fast, Accurate Summation of Floating-Point
> Numbers”/ (IBM Research)
> https://research.ibm.com/publications/fast-accurate-summation-of-floating-point-numbers
> I would particularly welcome feedback from SG14 on:
>
> *
> Whether the semantic contract is clear and appropriately scoped
> *
> Any concerns regarding interaction with existing parallel algorithms
> *
> Whether the lane-parameterised formulation is the right
> abstraction boundary
>
> If there are no major objections, I intend to move this toward LEWG
> for initial direction.
Would it be worthwhile to float it past SG1 for feedback as well? They
might just rubber-stamp it, but it would probably be a good idea to at
least ensure they're aware of it.
> Best regards,
> Andrew
>
>
> _______________________________________________
> SG14 mailing list
> SG14_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/sg14
Received on 2026-02-16 14:46:16
