Subject: Re: [std-proposals] A novel way to SIMD library
From: Ville Voutilainen (ville.voutilainen_at_[hidden])
Date: 2020-05-16 06:10:07
I wonder what Dr. Kretz thinks about this.
On Sat, 16 May 2020 at 14:01, kate via Std-Proposals
> Hi there.
> I'd like to share with you a spirit about building SIMD library. I
> created a post
> on the reddit and some readers said that it could be a viable
> replacement for the
> existing SIMD proposal. I am not an expert on that so I'd like to know
> if you think it
> really deserves a proposal.
> There are already tons of SIMD libraries, which usually export user-friendly
> interfaces by wrapping underlying messy SIMD intrinsics. But I found a
> novel approach
> to implement similar interfaces. The idea is very simple: unroll the
> loops introduced
> by vector operations at compile time and leave the rest work, i.e.
> generating code using
> SIMD instructions, to compilers. Then the vectorization is done. It
> relies on compilers'
> optimization ability to generate vectorized instructions from unrolled
> code. For modern
> compiler, it's should be easy.
> A simple benchmark showed that code vectorized by this way had
> performance similar
> to those using intrinsics.
> Its simplicity brings some significant advantages over those traditional
> 1) Easy to be implemented and standardized. All it needs are template
> tricks. Unrolling
> can be realized using parameter pack expansion, while user-friendly
> interfaces depend on
> template functions and operator overloading. Neither intrinsics nor
> extra library are required.
> 2) Portability. Since no intrinsics are required, it can be used on
> various platforms.
> 3) Evolving efficiency. As compilers are constantly improved, the work
> falling on them
> will be done better and better.
> 4) Never out of date. Every few years, CPU vendors will add a batch of
> new SIMD instructions.
> The library built in this way need no modifications, because the
> compilers will adapt them
> to these changes and generate more efficient code.
> But it does have shortcomings. Experiments show that:
> 1) Performance varies across compilers. Different compilers generated
> code of different
> performance. And Clang behaved better than GCC.
> 2) It doesn't help much with simple code, as compilers have already
> optimized them very well.
> If this method becomes more mainstream, it will incentivise compilers
> and CPU vendors to
> ensure the best SIMD instructions for each architecture are used for
> various transforms on
> homogeneous of contiguous data, which will only strength this
> technique's efficacy.
> That's all about it. I'm looking forward to your comments.
> Reddit Post:
> A simple and incomplete implementation:
> Yours Sincerely,
> Zhang Zhang
> Std-Proposals mailing list
STD-PROPOSALS list run by firstname.lastname@example.org
Standard Proposals Archives on Google Groups