C++ Logo

std-discussion

Advanced search

Parallel STL std::reduce() and monoid

From: Shumpei Shiina <s417.lama_at_[hidden]>
Date: Tue, 15 Aug 2023 01:25:00 +0900
Hi,

The standard says `init` should be accumulated only once for parallel STL
(e.g., reduce(), transform_reduce(), exclusive_scan()), but this is
contradicting the concept of "monoid", which is often used for functional
programming for parallel patterns.

std::reduce():
```
  template<class ExecutionPolicy, class ForwardIterator, class T, class
BinaryOperation>
    T reduce(ExecutionPolicy&& exec,
             ForwardIterator first, ForwardIterator last, T init,
             BinaryOperation binary_op);
```

In the above std::reduce() example, the pair of `binary_op` and `init` does
not constitute a (commutative) monoid, because `init` may not be an
"identity" element.

An identity element satisfies the following:
binary_op(identity, x) = x
binary_op(x, identity) = x

This means that the final result does not change regardless of how many
times the identity element is accumulated.
This property is beneficial for parallel implementations.

With an identity element, std::reduce() can be straightforwardly
parallelized for each thread:
```
// [first, last) is a subrange given to this thread
T acc = identity;
for (auto it = first; it != last; ++it) {
  acc = binary_op(acc, *it);
}
// combine `acc` with other threads
```

However, in reality, STL does not have an identity element, and `init`
should be accumulated only once.
Thus, the code will look like:
```
T acc = *first;
for (auto it = first + 1; it != last; ++it) {
  acc = binary_op(acc, *it);
}
// combine `acc` with other threads
```

However, according to the following statement,

> All of binary_­op(init, *first), binary_­op(*first, init),
binary_­op(init, init), and binary_­op(*first, *first) shall be convertible
to T.

the type of each value (*first) is not necessarily T.

Thus, initialization of acc for each thread (`T acc = *first;`) may not be
correct.
It should be changed to `T acc = binary_op(*first, *(first+1))`, but what
if each thread is assigned only one element?

In fact, an existing STL implementation assumes `*first` is of type T:
https://github.com/KhronosGroup/SyclParallelSTL/blob/b2cdb3580cc581172ed4de133b05d112afde5b50/include/sycl/algorithm/buffer_algorithms.hpp#L206

Therefore, I think the standard may want to consider this issue either:
- by assuming `*first` is of type T for parallel execution, or
- by taking an argument for an identity element to construct a monoid.

Received on 2023-08-14 16:25:11