std-proposals: Re: A novel way to SIMD library

From: Ville Voutilainen <ville.voutilainen_at_[hidden]>
Date: Sat, 16 May 2020 14:10:07 +0300

I wonder what Dr. Kretz thinks about this.

On Sat, 16 May 2020 at 14:01, kate via Std-Proposals
<std-proposals_at_[hidden]> wrote:
>
> Hi there.
>
> I'd like to share with you a spirit about building SIMD library. I
> created a post
> on the reddit and some readers said that it could be a viable
> replacement for the
> existing SIMD proposal. I am not an expert on that so I'd like to know
> if you think it
> really deserves a proposal.
>
> There are already tons of SIMD libraries, which usually export user-friendly
> interfaces by wrapping underlying messy SIMD intrinsics. But I found a
> novel approach
> to implement similar interfaces. The idea is very simple: unroll the
> loops introduced
> by vector operations at compile time and leave the rest work, i.e.
> generating code using
> SIMD instructions, to compilers. Then the vectorization is done. It
> relies on compilers'
> optimization ability to generate vectorized instructions from unrolled
> code. For modern
> compiler, it's should be easy.
>
> A simple benchmark showed that code vectorized by this way had
> performance similar
> to those using intrinsics.
>
> Its simplicity brings some significant advantages over those traditional
> ones:
>
> 1) Easy to be implemented and standardized. All it needs are template
> tricks. Unrolling
> can be realized using parameter pack expansion, while user-friendly
> interfaces depend on
> template functions and operator overloading. Neither intrinsics nor
> extra library are required.
>
> 2) Portability. Since no intrinsics are required, it can be used on
> various platforms.
>
> 3) Evolving efficiency. As compilers are constantly improved, the work
> falling on them
> will be done better and better.
>
> 4) Never out of date. Every few years, CPU vendors will add a batch of
> new SIMD instructions.
> The library built in this way need no modifications, because the
> compilers will adapt them
> to these changes and generate more efficient code.
>
> But it does have shortcomings. Experiments show that:
>
> 1) Performance varies across compilers. Different compilers generated
> code of different
> performance. And Clang behaved better than GCC.
>
> 2) It doesn't help much with simple code, as compilers have already
> optimized them very well.
>
> If this method becomes more mainstream, it will incentivise compilers
> and CPU vendors to
> ensure the best SIMD instructions for each architecture are used for
> various transforms on
> homogeneous of contiguous data, which will only strength this
> technique's efficacy.
>
> That's all about it. I'm looking forward to your comments.
>
> Reddit Post:
> https://www.reddit.com/r/cpp/comments/ghh63z/a_novel_way_to_build_simd_library/
>
> A simple and incomplete implementation:
> https://github.com/eatingtomatoes/pure_simd
>
> Yours Sincerely,
> Zhang Zhang
> --
> Std-Proposals mailing list
> Std-Proposals_at_[hidden]
> https://lists.isocpp.org/mailman/listinfo.cgi/std-proposals

Received on 2020-05-16 06:13:21