Date: Sun, 23 Jun 2019 18:25:46 -0400
Hi Mark,
For the current low-level BLAS interface, I believe leaving out tags is a fair decision. I was expecting a much higher level linear algebra library as a result of the SG14 effort. I am sure many others will as well. I do hope the linear algebra efforts do not stop there, and we get “Matlab for C++” someday 😉.
Reading the proposals through the eye of “does this work with quaternions”, I do not see major issues. Most functions could simply be renamed for their quaternion counterparts. One example of required flexibility is the accessor_conjugate. The following code specializes the view and thus, quaternion support should be possible.
template<class T>
class conjugated_quaternion {
operator T() const { return { -val[0], -val[1], -val[2], val[3] }; }
/* ... */
};
template<class Accessor, class T>
class accessor_conjugate<Accessor, std::quaterion_tag<T>> {
using reference =
conjugated_quaternion<Accessor::reference, std::quaternion_tag<T>>;
using offset_policy =
accessor_conjugate<Accessor::offset_policy, std::quaternion_tag<T>>;
/* ... */
};
The proposal quickly mentions SIMD, but doesn’t delve into details. The basic_mdarray proposal doesn’t either. Do you expect algorithms to individually optimize themselves using SIMD? This is more of an issue for the basic_mdarray proposal, but I believe some support may be required to get extra performance benefits.
Assuming an in_vector_t stored on the stack and preloaded in your registers, how could a function deduce this? Chaining operations would then load/store at every call, which is fine but not ideal. Has there been any review of basic_mdarray from P0214R9 authors?
I’m also surprised at the absence of any GPU compute mention. Matrix operation are, after all, the “raison d’être” of GPUs (amongst other things). I would expect the paper to dive into the subject, even if it is simply to mention “yes we have thought about it, yes it will work” 😉. Machine learning will want to be reassured as well.
As far as I can tell, mdspan is actually well suited for this. Plus, there seems to already be some OpenCL/CUDA implementations of BLAS (https://github.com/CNugteren/CLBlast). Have you taken a look into SYCL (https://www.khronos.org/sycl/), its Parallel STL (https://github.com/KhronosGroup/SyclParallelSTL) and the Codeplay implementation (https://developer.codeplay.com/products/computecpp/ce/guides/sycl-guide/hello-sycl)?
In theory, you should be able to demand some GPU memory buffers, map mdspans into those, and call the interface using an appropriate execution_policy. However, this may be wishful thinking and is certainly unsafe. There may be some extra features required in basic_mdspan and/or the interface to allow a better integration with SYCL.
At the very least, since the BLAS is already implemented for GPU, it gives confidence in the design choice.
As a user, I am quite delighted with the proposal. I think this is a major step forward C++ and will be quite useful for the language. Mdspan and mdarray are also major additions. I doubt I will interact with the library directly in my day-to-day work, but creating wrapper libraries seems quite straight-forward. Hopefully it gets quaternion support some day.
Thumbs up from me 👍
Philippe
From: Hoemmen, Mark
Sent: Thursday, June 20, 2019 6:05 PM
To: Gael Guennebaud; sg14_at_[hidden]
Cc: p groarke
Subject: Re: [EXTERNAL] Re: [SG14] Linear algebra library proposal
Greetings and thanks for the feedback!
On 6/18/19, 4:08 PM, "Gael Guennebaud" <mailto:gael.guennebaud_at_[hidden]> wrote:
> As the main developer of the Eigen library I follow those LA proposals with great interest. I do support the idea of a low-level C++ BLAS-like API as I ended up writing something similar within Eigen’s internals.
We thought of libraries like Eigen as a big target audience for this proposal, so it's good to hear that. P1674 explains how we ended up developing similar (but uglier) libraries in Trilinos.
> I appreciated reading the detailed discussions on design choices, but I did not find any on the “Side” parameters. In my opinion, they are dangerous and error prone because they make harder to parse which operation is going to be performed (have you to reverse the arguments or not?), and they are also (slightly) ambiguous (which one is on the left/right of the other?). So I strongly advice to find a way to get rid of the “Side” parameters.
This is helpful feedback -- thanks! I can see the value of having the function order express the order of mathematical operations, rather than introducing a tag or enum.
> To this end, the most obvious solution is probably to follow your “option 2” of paragraph "Retain view functions?" [1]. This option also makes it easier to generalize beyond BLAS limits, but that’s another topic!
I've CC'd Philippe because both of you gave feedback along these lines. Philippe asked whether we've considered tag dispatching for algorithm overloads, as a way to add overloads for e.g., quaternions. You suggested property tagging as a way to avoid a "Side" argument, by marking a matrix as "triangular" etc.
We went back and forth about a property tagging - based design. In the end, we decided against it. The main issue we see is that property tags "look like" iterator categories, but don't behave like them. You can write generic algorithms using iterator categories, but you can't write generic algorithms using arbitrary mathematical properties. (Just imagine the many generalizations of symmetry, for example.) This would make our proposal into a bunch of customization points, instead of a bunch of generic algorithms with high-performance implementations. In our view, a library customizable on mathematical properties is a high-level library, something more like Eigen and less like the BLAS. We wanted to be the low-level thing that higher-level code calls to get good performance and reasonable accuracy.
Anyway, I keep defending the proposal instead of soliciting more feedback ; - ) . How do you feel about this? Does the lack of property tagging and custom mathematical properties make this proposal a "no" vote for you? Does this "low-level" / "high-level" architecture make sense, or would it be better for a single library to try to cover both?
Thanks!
mfh
For the current low-level BLAS interface, I believe leaving out tags is a fair decision. I was expecting a much higher level linear algebra library as a result of the SG14 effort. I am sure many others will as well. I do hope the linear algebra efforts do not stop there, and we get “Matlab for C++” someday 😉.
Reading the proposals through the eye of “does this work with quaternions”, I do not see major issues. Most functions could simply be renamed for their quaternion counterparts. One example of required flexibility is the accessor_conjugate. The following code specializes the view and thus, quaternion support should be possible.
template<class T>
class conjugated_quaternion {
operator T() const { return { -val[0], -val[1], -val[2], val[3] }; }
/* ... */
};
template<class Accessor, class T>
class accessor_conjugate<Accessor, std::quaterion_tag<T>> {
using reference =
conjugated_quaternion<Accessor::reference, std::quaternion_tag<T>>;
using offset_policy =
accessor_conjugate<Accessor::offset_policy, std::quaternion_tag<T>>;
/* ... */
};
The proposal quickly mentions SIMD, but doesn’t delve into details. The basic_mdarray proposal doesn’t either. Do you expect algorithms to individually optimize themselves using SIMD? This is more of an issue for the basic_mdarray proposal, but I believe some support may be required to get extra performance benefits.
Assuming an in_vector_t stored on the stack and preloaded in your registers, how could a function deduce this? Chaining operations would then load/store at every call, which is fine but not ideal. Has there been any review of basic_mdarray from P0214R9 authors?
I’m also surprised at the absence of any GPU compute mention. Matrix operation are, after all, the “raison d’être” of GPUs (amongst other things). I would expect the paper to dive into the subject, even if it is simply to mention “yes we have thought about it, yes it will work” 😉. Machine learning will want to be reassured as well.
As far as I can tell, mdspan is actually well suited for this. Plus, there seems to already be some OpenCL/CUDA implementations of BLAS (https://github.com/CNugteren/CLBlast). Have you taken a look into SYCL (https://www.khronos.org/sycl/), its Parallel STL (https://github.com/KhronosGroup/SyclParallelSTL) and the Codeplay implementation (https://developer.codeplay.com/products/computecpp/ce/guides/sycl-guide/hello-sycl)?
In theory, you should be able to demand some GPU memory buffers, map mdspans into those, and call the interface using an appropriate execution_policy. However, this may be wishful thinking and is certainly unsafe. There may be some extra features required in basic_mdspan and/or the interface to allow a better integration with SYCL.
At the very least, since the BLAS is already implemented for GPU, it gives confidence in the design choice.
As a user, I am quite delighted with the proposal. I think this is a major step forward C++ and will be quite useful for the language. Mdspan and mdarray are also major additions. I doubt I will interact with the library directly in my day-to-day work, but creating wrapper libraries seems quite straight-forward. Hopefully it gets quaternion support some day.
Thumbs up from me 👍
Philippe
From: Hoemmen, Mark
Sent: Thursday, June 20, 2019 6:05 PM
To: Gael Guennebaud; sg14_at_[hidden]
Cc: p groarke
Subject: Re: [EXTERNAL] Re: [SG14] Linear algebra library proposal
Greetings and thanks for the feedback!
On 6/18/19, 4:08 PM, "Gael Guennebaud" <mailto:gael.guennebaud_at_[hidden]> wrote:
> As the main developer of the Eigen library I follow those LA proposals with great interest. I do support the idea of a low-level C++ BLAS-like API as I ended up writing something similar within Eigen’s internals.
We thought of libraries like Eigen as a big target audience for this proposal, so it's good to hear that. P1674 explains how we ended up developing similar (but uglier) libraries in Trilinos.
> I appreciated reading the detailed discussions on design choices, but I did not find any on the “Side” parameters. In my opinion, they are dangerous and error prone because they make harder to parse which operation is going to be performed (have you to reverse the arguments or not?), and they are also (slightly) ambiguous (which one is on the left/right of the other?). So I strongly advice to find a way to get rid of the “Side” parameters.
This is helpful feedback -- thanks! I can see the value of having the function order express the order of mathematical operations, rather than introducing a tag or enum.
> To this end, the most obvious solution is probably to follow your “option 2” of paragraph "Retain view functions?" [1]. This option also makes it easier to generalize beyond BLAS limits, but that’s another topic!
I've CC'd Philippe because both of you gave feedback along these lines. Philippe asked whether we've considered tag dispatching for algorithm overloads, as a way to add overloads for e.g., quaternions. You suggested property tagging as a way to avoid a "Side" argument, by marking a matrix as "triangular" etc.
We went back and forth about a property tagging - based design. In the end, we decided against it. The main issue we see is that property tags "look like" iterator categories, but don't behave like them. You can write generic algorithms using iterator categories, but you can't write generic algorithms using arbitrary mathematical properties. (Just imagine the many generalizations of symmetry, for example.) This would make our proposal into a bunch of customization points, instead of a bunch of generic algorithms with high-performance implementations. In our view, a library customizable on mathematical properties is a high-level library, something more like Eigen and less like the BLAS. We wanted to be the low-level thing that higher-level code calls to get good performance and reasonable accuracy.
Anyway, I keep defending the proposal instead of soliciting more feedback ; - ) . How do you feel about this? Does the lack of property tagging and custom mathematical properties make this proposal a "no" vote for you? Does this "low-level" / "high-level" architecture make sense, or would it be better for a single library to try to cover both?
Thanks!
mfh
Received on 2019-06-23 17:27:41