Date: Tue, 15 Aug 2023 01:25:00 +0900

Hi,

The standard says `init` should be accumulated only once for parallel STL

(e.g., reduce(), transform_reduce(), exclusive_scan()), but this is

contradicting the concept of "monoid", which is often used for functional

programming for parallel patterns.

std::reduce():

```

template<class ExecutionPolicy, class ForwardIterator, class T, class

BinaryOperation>

T reduce(ExecutionPolicy&& exec,

ForwardIterator first, ForwardIterator last, T init,

BinaryOperation binary_op);

```

In the above std::reduce() example, the pair of `binary_op` and `init` does

not constitute a (commutative) monoid, because `init` may not be an

"identity" element.

An identity element satisfies the following:

binary_op(identity, x) = x

binary_op(x, identity) = x

This means that the final result does not change regardless of how many

times the identity element is accumulated.

This property is beneficial for parallel implementations.

With an identity element, std::reduce() can be straightforwardly

parallelized for each thread:

```

// [first, last) is a subrange given to this thread

T acc = identity;

for (auto it = first; it != last; ++it) {

acc = binary_op(acc, *it);

}

// combine `acc` with other threads

```

However, in reality, STL does not have an identity element, and `init`

should be accumulated only once.

Thus, the code will look like:

```

T acc = *first;

for (auto it = first + 1; it != last; ++it) {

acc = binary_op(acc, *it);

}

// combine `acc` with other threads

```

However, according to the following statement,

> All of binary_op(init, *first), binary_op(*first, init),

binary_op(init, init), and binary_op(*first, *first) shall be convertible

to T.

the type of each value (*first) is not necessarily T.

Thus, initialization of acc for each thread (`T acc = *first;`) may not be

correct.

It should be changed to `T acc = binary_op(*first, *(first+1))`, but what

if each thread is assigned only one element?

In fact, an existing STL implementation assumes `*first` is of type T:

https://github.com/KhronosGroup/SyclParallelSTL/blob/b2cdb3580cc581172ed4de133b05d112afde5b50/include/sycl/algorithm/buffer_algorithms.hpp#L206

Therefore, I think the standard may want to consider this issue either:

- by assuming `*first` is of type T for parallel execution, or

- by taking an argument for an identity element to construct a monoid.

The standard says `init` should be accumulated only once for parallel STL

(e.g., reduce(), transform_reduce(), exclusive_scan()), but this is

contradicting the concept of "monoid", which is often used for functional

programming for parallel patterns.

std::reduce():

```

template<class ExecutionPolicy, class ForwardIterator, class T, class

BinaryOperation>

T reduce(ExecutionPolicy&& exec,

ForwardIterator first, ForwardIterator last, T init,

BinaryOperation binary_op);

```

In the above std::reduce() example, the pair of `binary_op` and `init` does

not constitute a (commutative) monoid, because `init` may not be an

"identity" element.

An identity element satisfies the following:

binary_op(identity, x) = x

binary_op(x, identity) = x

This means that the final result does not change regardless of how many

times the identity element is accumulated.

This property is beneficial for parallel implementations.

With an identity element, std::reduce() can be straightforwardly

parallelized for each thread:

```

// [first, last) is a subrange given to this thread

T acc = identity;

for (auto it = first; it != last; ++it) {

acc = binary_op(acc, *it);

}

// combine `acc` with other threads

```

However, in reality, STL does not have an identity element, and `init`

should be accumulated only once.

Thus, the code will look like:

```

T acc = *first;

for (auto it = first + 1; it != last; ++it) {

acc = binary_op(acc, *it);

}

// combine `acc` with other threads

```

However, according to the following statement,

> All of binary_op(init, *first), binary_op(*first, init),

binary_op(init, init), and binary_op(*first, *first) shall be convertible

to T.

the type of each value (*first) is not necessarily T.

Thus, initialization of acc for each thread (`T acc = *first;`) may not be

correct.

It should be changed to `T acc = binary_op(*first, *(first+1))`, but what

if each thread is assigned only one element?

In fact, an existing STL implementation assumes `*first` is of type T:

https://github.com/KhronosGroup/SyclParallelSTL/blob/b2cdb3580cc581172ed4de133b05d112afde5b50/include/sycl/algorithm/buffer_algorithms.hpp#L206

Therefore, I think the standard may want to consider this issue either:

- by assuming `*first` is of type T for parallel execution, or

- by taking an argument for an identity element to construct a monoid.

Received on 2023-08-14 16:25:11