Hi,

I'm currently porting our code base from the shared PPL/TBB subset to the C++17 parallel algorithms. During this process I noticed that one "algorithm" is missing in stdlib: a parallelizable counted loop.

Whilst I normally prefer iterator-based algorithms (and maybe ranges in the near future), there are complex access patterns that are hard/near impossible to port to such an approach.

Therefore I want to propose the following 3 overloads to be added to the algorithm library:

void for_n(ExecutionPolicy && policy, Integral first, Integral last, Integral step, UnaryFunction f);

void for_n(ExecutionPolicy && policy, Integral first, Integral last, UnaryFunction f); //=> step == 1

void for_n(ExecutionPolicy && policy, Integral count, UnaryFunction f); //=> range [0, count), step == 1

All of them should be marked constexpr, follow the established noexcept-convention and only participate in overload resolution if is_integral_v<Intergral> && is_execution_policy_v<decay_t<ExecutionPolicy>>.

A sample implementation based on for_each is available at: https://github.com/MFHava/PSX/blob/for_n/inc/psx/for_n.hpp

Looking forward to feedback,

Michael