Date: Wed, 8 Nov 2023 16:14:15 -0500
Ah, beautiful! Looks like all of my concerns are mirrored :)
Was that... authored today? What a coincidence...
On Wed, Nov 8, 2023 at 4:09 PM Ville Voutilainen <
ville.voutilainen_at_[hidden]> wrote:
> On Wed, 8 Nov 2023 at 23:06, Tor Shepherd via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
> >
> > Hi,
> >
> > I've been playing around a lot with the awesome std::experimental::simd
> library in GCC >11, but it's a bit error prone to iterate over a container
> by SIMD chunks:
> >
> > using intv = std::experimental::fixed_size_simd<int, 4>;
> >
> > std::vector<int> v{1, 2, 3, 4, 5, 6, 7};
> > // am I sliding over by the right amount? Going past the end?
> > for (size_t i = 0U; i < v.size() i += intv::size())
> > {
> > intv block{&v[i], std::experimental::element_aligned}; // am I
> accessing out of bounds?
> > std::print("{}, {}, {}, {}\n", block[0], block[1], block[2],
> block[3]);
> > }
> > // prints:
> > // 1, 2, 3, 4
> > // 5, 6, 7, ?
> >
> > This example code would access past-the-end of the vector with that end
> condition. However, if there were a simd_iterator defined, you could do
> this more safely:
> >
> > std::vector<int> v{1, 2, 3, 4, 5, 6, 7};
> > for (auto blockIter = simd_begin<intv>(v, -1); blockIter <
> simd_end<intv>(v); ++blockIter)
> > {
> > intv block = *blockIter;
> > std::print("{}, {}, {}, {}\n", block[0], block[1], block[2],
> block[3]);
> > }
> > // prints:
> > // 1, 2, 3, 4
> > // 5, 6, 7, -1
> >
> > This iterator would just wrap the logic from before, with keeping track
> of the missing remainder (similar to how strided_view works), and then for
> the last SIMD block, could do masked loading from only the valid memory and
> set the uninitialized padding values with a user default.
> >
> > If we have a simd_begin and simd_end defined, it's only reasonable to
> have a views::by_simd adaptor:
> >
> > std::vector<int> v{1, 2, 3, 4, 5, 6, 7};
> > for (auto block : v | views::by_simd<intv>(-1))
> > {
> > std::print("{}, {}, {}, {}\n", block[0], block[1], block[2],
> block[3]);
> > }
> > // prints:
> > // 1, 2, 3, 4
> > // 5, 6, 7, -1
> >
> > From what I've seen this "SIMD-chunk over contiguous range" is a very
> common use case for SIMD.
> >
> > Is there interest in this idea?
> >
> > As a side effect, this enables some cool usage with standard algorithms:
> >
> > template<typename T>
> > T simd_inner_product(std::span<T> a, std::span<T> b) {
> > using S = std::experimental::native_simd<T>;
> > return std::experimental::reduce(
> > std::inner_product(simd_begin<S>(a, 0), simd_end<S>(a),
> simd_begin<S>(b, 0), S{}));
> > }
> >
> > On my system (tigerlake, gcc10, -mavx2 -mfma, T = float) this is about
> 3x faster than std::inner_product (though I could have done something wrong
> to prevent GCC from autovectorizing)
> >
> > (If this is not the right forum to share this idea, I apologize 😅.
> Please let me know where to direct my proposal)
> >
> > Thanks for reading,
>
> Please check https://isocpp.org/files/papers/D3024R0.html
> and talk to its authors.
>
Was that... authored today? What a coincidence...
On Wed, Nov 8, 2023 at 4:09 PM Ville Voutilainen <
ville.voutilainen_at_[hidden]> wrote:
> On Wed, 8 Nov 2023 at 23:06, Tor Shepherd via Std-Proposals
> <std-proposals_at_[hidden]> wrote:
> >
> > Hi,
> >
> > I've been playing around a lot with the awesome std::experimental::simd
> library in GCC >11, but it's a bit error prone to iterate over a container
> by SIMD chunks:
> >
> > using intv = std::experimental::fixed_size_simd<int, 4>;
> >
> > std::vector<int> v{1, 2, 3, 4, 5, 6, 7};
> > // am I sliding over by the right amount? Going past the end?
> > for (size_t i = 0U; i < v.size() i += intv::size())
> > {
> > intv block{&v[i], std::experimental::element_aligned}; // am I
> accessing out of bounds?
> > std::print("{}, {}, {}, {}\n", block[0], block[1], block[2],
> block[3]);
> > }
> > // prints:
> > // 1, 2, 3, 4
> > // 5, 6, 7, ?
> >
> > This example code would access past-the-end of the vector with that end
> condition. However, if there were a simd_iterator defined, you could do
> this more safely:
> >
> > std::vector<int> v{1, 2, 3, 4, 5, 6, 7};
> > for (auto blockIter = simd_begin<intv>(v, -1); blockIter <
> simd_end<intv>(v); ++blockIter)
> > {
> > intv block = *blockIter;
> > std::print("{}, {}, {}, {}\n", block[0], block[1], block[2],
> block[3]);
> > }
> > // prints:
> > // 1, 2, 3, 4
> > // 5, 6, 7, -1
> >
> > This iterator would just wrap the logic from before, with keeping track
> of the missing remainder (similar to how strided_view works), and then for
> the last SIMD block, could do masked loading from only the valid memory and
> set the uninitialized padding values with a user default.
> >
> > If we have a simd_begin and simd_end defined, it's only reasonable to
> have a views::by_simd adaptor:
> >
> > std::vector<int> v{1, 2, 3, 4, 5, 6, 7};
> > for (auto block : v | views::by_simd<intv>(-1))
> > {
> > std::print("{}, {}, {}, {}\n", block[0], block[1], block[2],
> block[3]);
> > }
> > // prints:
> > // 1, 2, 3, 4
> > // 5, 6, 7, -1
> >
> > From what I've seen this "SIMD-chunk over contiguous range" is a very
> common use case for SIMD.
> >
> > Is there interest in this idea?
> >
> > As a side effect, this enables some cool usage with standard algorithms:
> >
> > template<typename T>
> > T simd_inner_product(std::span<T> a, std::span<T> b) {
> > using S = std::experimental::native_simd<T>;
> > return std::experimental::reduce(
> > std::inner_product(simd_begin<S>(a, 0), simd_end<S>(a),
> simd_begin<S>(b, 0), S{}));
> > }
> >
> > On my system (tigerlake, gcc10, -mavx2 -mfma, T = float) this is about
> 3x faster than std::inner_product (though I could have done something wrong
> to prevent GCC from autovectorizing)
> >
> > (If this is not the right forum to share this idea, I apologize 😅.
> Please let me know where to direct my proposal)
> >
> > Thanks for reading,
>
> Please check https://isocpp.org/files/papers/D3024R0.html
> and talk to its authors.
>
-- Tor Shepherd M.S. Robotics | University of Michigan, Ann Arbor B.S. Mechanical Engineering | University of Illinois, Urbana-Champaign LinkedIn <https://www.linkedin.com/in/tor-shepherd-294644148/> | GitHub <https://github.com/torshepherd>
Received on 2023-11-08 21:14:27