Wed 06 Mar 2019
C++ linear algebra phone meeting
Present:
- Bob Steagall (BS) (leading)
- Mark Hoemmen (MH)
- Jayesh Badwaik (JB)
- Cem Bassoy (CB)
- Klaus Iglberger (KI)
- Graham Lopez (GL)
- Andrew Lumsdaine (AL)
- Michael Wong (MW)
- Charles Bay (CB2)
MH: I met Jeff Hammond at SIAM CSE last week. He expressed the wish
that a linear algebra proposal include or at least be open to tensor
algebra. I suggested that he write a tensor proposal, since we didn't
have any at the Kona meeting and he is an expert. He wishes he had
time to write a tensor proposal.
CB: I have a couple suggestions for recent tensor papers to read.
1. "High-Performance tensor contraction without transposition,"
Davin Matthews. I think Jeff [Hammond] might have even worked
with him.
https://epubs.siam.org/doi/abs/10.1137/16M108968X
2. "Design of a High-Performance GEMM-like Tensor-Tensor
Multiplication," Paul Springer:
https://dl.acm.org/citation.cfm?id=3157733
---
BS: In my e-mail, I circulated a write-up by JB, called "Some
Observations on Implementation of P1385." Shall we begin by
discussing this paper?
JB: I think some of my points in the "some observations" paper were
addressed in the minutes from the Kona meeting; let's talk about that
first.
BS: I just got minutes from LEWG-I yesterday, so I haven't yet
incorporated their feedback into 1385. I will do so then submit a new
version of P1385 for the post-Kona-meeting mailing. What we learned:
1. Indexing: Everyone strongly in favor of zero-based indexing.
2. Against separate row vector vs. column vector types; want one
vector type, with context-dependent orientation.
3. Star would default to inner; explicit outer product.
4. Explore Engine types, to implement in terms of mdspan, to get
reuse. Also need storage.
5. Explciitly like named functions, in addition to operators.
Andrew Lumsdaine, original author of the Matrix Template Library
(MTL), presented both in LEWG-I and at the SG14/19 combined session.
At end of day, after his MTL presentation, I debriefed with him. He
thinks our approach is good.
Vincent Reverdy raised a point about the naming of Vector -- see
review attached to meeting notice. "Hypermatrix" etc. He objects to
the name "vector." I have a different objection to the name "vector"
-- it would collide with std::vector -- unless we put it in a
different namespace. A proposal for named constants would live in the
std::math namespace, so perhaps we could use std::math for our vector
type.
MH: I brought up this issue with Jeff Hammond last week. JH pointed
out that there are three main communities of tensor users: general
relativity, quantum chemistry, and machine learning. Vincent does
general relativity. JH says that quantum chemistry and machine
learning have needs that are most alike. I'm not saying we should
marginalize Vincent's needs, but it could also be true that the
largest community of users is less concerned about careful
distinctions between hypermatrices, matrices, vectors, and tensors.
GL: Regardless of the representation share of the tensor community, it
seems like, at very start, what we're doing here with the linear
algebra proposal, something like tensors could be built on top of it.
We don't need to design a tensor library at the same time. If we do a
good job with these lower-level designs, tensors could reuse all of
this. It's a tangential issue anyway.
JB: Linear algebra objects have very common names, like "vector."
This suggests that there should be a separate namespace for linear
algebra. I'm not sure how enthusiastic the Committee would be about
that. Did you get feedback?
BS: Titus Winters (chairman of LEWG) will push back against having new
namespaces. He doesn't want an explosion of subnamespaces. But if
you can make a convincing case, I don't think he would stand against
it. For example, a convincing case was made for the chrono namespace
and literals.
[AL & MW enter]
GL: If we need to push for a new namespace, we would have more weight
pushing for "math" than for [something like] "linear_algebra."
BS: That was feedback from LEWG-I and the SG14/19 combined meeting.
More feedback: Not reflected in poll but was in LEWG-I minutes: They
want P1385 to have more discussion of prior art. SG20 feedback: They
want a "Gor table" (see P1362R0, Section 4.4) -- showing feature
levels and anticipated user sophistication for each. SG20 would like
authors of new papers to include tables like this. We generally got
favorable feedback from both rooms. Working in strawman
implementation repository. If anyone wants to try implementing
arithmetic traits, even in naive way that returns temporaries, please
feel welcome. Some people -- JB and KI already -- have already tried
implementing P1385.
MH: We plan to write a paper on a low-level interface, and want that
to be complementary to P1385. Christian Trott (CT) and I discussed
this with you at Kona.
BS: Great idea, would like to help, map primitive ops to corresponding
arithmetic traits, see how they fit.
MH: We don't have a proposal yet, and are not sure if we will make the
post-Kona-meeting mailing deadline, but do plan to submit at least for
the pre-Cologne-meeting deadline. We will keep in touch with you
about it.
---
BS: Let's talk about JB's write-up "some observations."
JB: I'll share my own screen. I started implementing this before the
Kona meeting, so this is based on P1385P0. First, in the Standard
Template Library, we have sort algorithm, can provide it with any kind
of container which has begin and end. Do we want similar thing with
the linear algebra library, so that users can give it different linear
algebra data structures?
AL: Yes, I think we need to. What made the STL successful was its
openness. Need not be based on iterators, could be generic indexing
syntax.
MH: First, we have the experience that users treat matrix data
structures as data containers to share between software modules. For
example, users use Trilinos' matrix data structures well beyond the
original intended use case. Thus, it would be good for a linear
algebra library to be able to work with users' data structures, rather
than mandating a particular data structure. Second, mdspan offers a
generic indexing syntax.
AL: There's a difference between users' use of a matrix, and
implementing the functions over it. mdspan could be a matrix at the
implementation level.
MH: I agree. Also, mdspan is not necessarily a matrix at the user
level, for reasons discussed in my Kona talk.
JB: Regarding engine promotion traits for arithmetic operations: if
Engine1 and Engine2 are standard types, then we can decide what to do.
If one is a user's type, the user can decide. What if both are
[different] user types? Then best is that no one would specify what
would happen in that case.
BS: Two situations there. JB's example, two diff engine types, one
from library1 and one from library2. Implementers of the two
libraries don't know anything about each other. User wants to use the
lbiraries together in a mixed expression. No predefined way to
determined how to promote.
JB: Yes, and best practice would be not to define it.
BS: Yes. In the strawman implementation, and how Guy and I envisioned
it, that would be a case where the library provides no guidance. No
default promotion. User would have to create their own traits and
decide how to promote. It might be a third engine type altogether.
JB: Even then, if library1 and library2 do different things, this might
introduce bugs in the code.
CB: I agree with Bob: it's the responsibility of the library user; we
can't ensure, except with concepts, that there is a certain type given
out.
JB: I agree with that. But even the user cannot do it very well --
they would have to redefine all the machinery on their own.
BS: Definitely that's true. However, today, if you had two linear
algebra libraries and you wanted to mix their types, could you even do
it?
JB: No, you couldn't.
BS: So, our hope is that at least by providing this traits mechanism,
permitting user specialization, it would help a sufficiently motivated
user solve this problem.
JB: I think I have a solution to this issue, which I'll present. Keep
that in mind. See Point (3) of my paper.
BS: About your Point (4): It's not clear that we want to require that
elements are fields. Some people wanted quaternions, which do not
obey all the field axioms.
MH: Also, some people want to express graph algorithms using linear
algebra operations [see e.g., the GraphBLAS effort]. In that case,
the ring axioms or even less suffice for matrix entries.
JB: That takes care of Point (5). Point (6) handled in meeting
minutes. Point (7): minutes said that Kronecker vs. outer product are
different. I'm still not decided on the position.
BS: Vector / matrix of dimension 1 / 1x1, mathematically, is
synonymous with an instance of the element type. Problem is getting
C++ type system to conform.
[Charles Bay (CB2) joins]
BS: If fixed size engine, size 1, then can have conversion operator,
no performance cost.
MH: That may be true mathematically, but may not be true in the
computer science sense. A scalar value is immediately available.
Accessing an entry in a matrix may require dereferencing a pointer, or
something even more complicated [when using mdspan with a custom
access policy]. The referenced datum may be expensive to access --
for example, it may live on a GPU.
BS: For static matrix storage, should be minimum run-time cost. For
dynamic storage, would require run-time check. This is one reason why
we pushed for row vs. column vectors to have different types -- so
that the multiply operation could return a scalar in that case,
instead of a 1x1 matrix.
MH: GPU example, may not want 1x1 to behave like scalar value always.
JB: I'll show how I dealt with the problem of different traits /
types. addition_trait, subtraction_trait, negation_trait. On
right-hand side, I have a custom class (jvector), in my own namespace
(testlac). I gave jvector an execution policy (EP) template
parameter. All types that want to interact with the standard library
should have this parameter. Then we can specialize the traits; they
only define result type, not how to compute. Then I defined separate
traits classes, called "engines," templated on EP, that define how to
compute e.g., negation, addition, and subtraction. I specialize the
engines for my own execution policy. I show different execution
policies; some can be provided by default. Now, since execution
policy is just a tag, we can demand a templated move constructor, that
moves a jvector with one policy into a jvector with another policy.
That's a cheap conversion from one to another. Now we can demand that
for any operations, the two execution policies must be the same. Now
all the arithmetic operations can work. Addition engine can just call
already implemented addition engines, if they want to reuse something
that's already been done. I've been able to have, for example, a
"verbose" policy [representing a custom user policy] and a
system-provided policy, with a static vector templated on the
execution policy and other stuff, and a different vector class, and I
can combine them in arithmetic operations. Another advantage: suppose
you have a tridiagonal matrix, but no arithmetic -- then you can
define your own execution policy, with existing containers, and you
can specialize that matrix multiplication operation and use it in your
code. Main idea is that we should require that every type which wants
to deal with our linear algebra library, should have this template
parameter. Need more experiments to see whether this works. See code
in my GitHub repository.
MH: JB, you require that the matrix or vector type take this
"execution policy" template parameter. What if a given execution
policy is not compatible with my matrix type? For example, I may have
a matrix that stores data on the GPU, so that it's not even correct to
execute on the CPU.
JB: First, could use static_assert to restrict type, but then your
class would need to know what to exclude.
CB: Eigen's tensor library has a solution to this. It uses expression
templates for that. You don't clip [attach] the execution policy to
the matrix or vector -- instead, you say to the expression that you
want to have it executed on the GPU, for instance. C.on(gpu) = A+B.
When operator= evaluates the expression, the ".on(gpu)" tells it to
run on the GPU.
BS: MH and Christian Trott in Kona suggested a similar syntax for
this. I agree that it would make sense for our proposal, and it seems
easy to do. JB, I like exec policy, ties into that conversation, but
I haven't had time to study your code since the latest developments.
MH: It sounds like the "C.on(gpu)" syntax, and JB's move constructor
(that is meant to allow changing the execution policy cheaply) are
trying to address the same issue.
CB (?): The problem is that there are two competing execution policies
with the Engine. I don't know how to solve; gets complicated... I
would rather like to have an expression to be executed on the GPU.
JB: I thought about tthis. Issue is that the types of the matrix or
vector can also affect how you want to execute -- e.g., sparse matrix
needs different iteration. Policy is based on type as well, not just
expression.
MH: JB, your "execution policy" conflates _where_ to run, vs. how to
iterate or access the matrix.
JB: They are conflated. [Sparse matrix example.]
MH: I could see that, e.g., for hardware that has a special execution
path for sparse matrix-vector multiply.
JB: This is my solution for the traits not matching.
BS: Any more questions or comments?
MH: We just need to look a bit more.
BS (? or JB): Should I continue?
MH: Yes, it's good for more of us to try implementing this, so we can
discover any issues.
BS: I'd like some time to study what you [JB] have, and to combine
that with what I've learend from Mark and Christian and Graham, and
see how that impacts our paper. JB, if you have suggestions for
execution policies in our paper, we'd love to hear them. Keep
exploring.
---
BS: Next steps for P1385: Incorporate Kona feedback into post-meeting
paper, now extra avenue of research raised by JB to think about before
Cologne.
MH: [discusses plan for low-level linear algebra interface paper, to
be complementary to P1385]
BS: Succession papers? MH mentioned one, JB may decide to write one
too, up to you. Guy and I would like to encourage people to write
papers. We want to build linear algebra momentum. For example, a
succession paper could propose an interface for doing matrix
decompositions [like LU or QR]; here's what the input and output types
would look like, based on 1385. Comments or discussion?
CB: Have you thought about views? Sections or ranges of matrices and
vectors?
BS: Yes, at bottom of D1385R1, Section 9, third bullet, would address
that. Could have views of a vector, matrix, etc. Does that answer
your question?
CB: Yes. I think we need that.
AL: This is where mdspan becomes very powerful. You can define a
separately indexable thing over the same data. LAPACK and standard
textbook linear algebra algorithms require / are written in terms of
subviews. Also needed for blocking and parallelization.
BS: I agree with AL. This would be a perfect way to test mdspan in
terms of two nonowning view types -- one const, one mutable -- need
the latter for decompositions. Definitnely on TODO list.
CB: I could help with that; I've implemented such things.
BS: OK, connect with me offline.
MW: You technically don't have to go through LEWG-I [for review]. I
spoke with (? ...) about this, since there has been some confusion.
It's normal for papers that pass SG review to go straight to LEWG or
EWG. Thus, it's up to you, Bob, to decide whether you get periodic
[LEWG-I] reviews. Normally, a paper that comes out of SG14 or SG19
would go directly to LEWG or EWG. But thank you, I'm really glad of
the implementation that is coming out of JB's work.
AL: Course I've been teaching at UW, simple linear algebra library,
compatible with design being proposed here.
MW: Good validation.
---
BS: Open floor to misc topics.
MH: Bob, you mentioned getting feedback on P1385 asking for prior art.
I do plan on expanding the history paper (P1417) to include new
material, based on both what I've found and what AL sent me. However,
I imagine that they would want P1385 to do more than just review
existing work -- they would want to see how that work connects with
your design. This would mean that an expansion of P1417 wouldn't
suffice; you would also need to add material to P1385 relating to your
design.
[Bob acknowledges this.]