Date: Sat, 16 May 2020 21:53:07 +0000
On Saturday, May 16, 2020 6:01 AM, kate via Std-Proposals <std-proposals_at_[hidden]> wrote:
> I found a
> novel approach
> to implement similar interfaces. The idea is very simple: unroll the
> loops introduced
> by vector operations at compile time and leave the rest work, i.e.
> generating code using
> SIMD instructions, to compilers. Then the vectorization is done. It
> relies on compilers'
> optimization ability to generate vectorized instructions from unrolled
> code. For modern
> compiler, it's should be easy.
>
I think many of us did this manually before,
e.g. tweaking the code until it gets auto-
vectorized. Having this accomplished via
a library is attractive for some uses.
But I can see a few things this approach
having trouble with:
1. Generated SIMD instructions in Debug build.
This approach couples uses of SIMD
algorithms with optimization. So in Debug
build the program may be too slow, or it
can produce different numerical results due
to change of instructions.
2. Control over ISA.
This approach does not abstract out ISA,
therefore the use of ISA is determined by
the build rather than from the code. Not
all SIMD libraries can switch between ISAs
or even mixing ISAs, but at least they
a chance to do so.
3. Fine control over the access pattern.
Loop unrolling mandates a particular access
pattern. It may contribute to a small
fraction of all the possible algorithms we
need.
> I found a
> novel approach
> to implement similar interfaces. The idea is very simple: unroll the
> loops introduced
> by vector operations at compile time and leave the rest work, i.e.
> generating code using
> SIMD instructions, to compilers. Then the vectorization is done. It
> relies on compilers'
> optimization ability to generate vectorized instructions from unrolled
> code. For modern
> compiler, it's should be easy.
>
I think many of us did this manually before,
e.g. tweaking the code until it gets auto-
vectorized. Having this accomplished via
a library is attractive for some uses.
But I can see a few things this approach
having trouble with:
1. Generated SIMD instructions in Debug build.
This approach couples uses of SIMD
algorithms with optimization. So in Debug
build the program may be too slow, or it
can produce different numerical results due
to change of instructions.
2. Control over ISA.
This approach does not abstract out ISA,
therefore the use of ISA is determined by
the build rather than from the code. Not
all SIMD libraries can switch between ISAs
or even mixing ISAs, but at least they
a chance to do so.
3. Fine control over the access pattern.
Loop unrolling mandates a particular access
pattern. It may contribute to a small
fraction of all the possible algorithms we
need.
-- Zhihao Yuan, ID lichray The best way to predict the future is to invent it. _______________________________________________
Received on 2020-05-16 16:56:17