If by “Sparse matrix” you mean a list or array of (source node, target node) pairs, then this is not consistent with the terminology in the Carnegie Mellon lecture I found. They would call that an “edge list” and define “adjacency array” in the way I did a few emails ago. In any case, terminology seems to be inconsistently used in literature and we need to define terms in the paper before we use them.
Also, we should consider supporting the thing the Carnegie Mellon lecture calls “adjacency array” (i.e. not the edge list), because that’s a very performant and cache-efficient layout for graphs that are read heavily and (almost) never written to.
‘Adjacency array’ is really just an adjacency sparse matrix.
We might want to consider using some standardized matrix storage object.
Sent from my iPhone
On Jul 20, 2019, at 2:40 AM, Sebastian Messmer via SG19 <firstname.lastname@example.org> wrote:
Upon further research, I found that the terms are used inconsistently in different sources.
The way I’ve learned and am using the terms is consistent with a Carnegie Mellon class I found: http://www.cs.cmu.edu/afs/cs/academic/class/15210-s12/www/lectures/lecture07.pdf
They have four graph layouts: adjacency matrix, adjacency list, adjacency array and edge list (edge list is the one where there’s just one array with all edges and each edge stores both corresponding vertex indices).
But I’ve also found sources that describe what I know as “adjacency array” but call it “adjacency list”: https://www.sciencedirect.com/topics/computer-science/adjacency-list
Though I don’t know what they call what I and the class linked above know as “adjacency list” then, maybe “linked adjacency list”?
Given this inconsistent situation, we should probably decide for a terminology and then define the terms in the paper before we use them.
I looked through the graph proposal paper and I think there might be some misunderstandings about what I meant when I talked about adjacency arrays in our phone call.
I found a picture online that might make things a bit more clear: https://www.researchgate.net/figure/The-adjacency-list-data-structure-used-in-our-implementation-shown-for-a-sample-graph_fig1_268988477
(they call it adjacency list in that link, but it’s actually an adjacency array).
An adjacency array stores all edges in one array, but sorted by vertex. There is an additional vertex array that stores pointers to its first edge in the edge array.
Say V is that vertex array and E is the edge array, then E[V[i] … V[i+1]-1] is the range storing all edges for vertex I, usually they store the vertex index of the vertex on the other side of the edge.
A directed graph that only needs to be forward-traversable does not store backwards edges, for each vertex the edge array only holds the out-edges.
A undirected graph stores each edge twice, once for each vertex.
A directed graph that need to be forward- and backward-traversable can be implemented by storing two indices in/out per vertex. E[V[i].in ... V[i].out-1] stores the in-edges and E[V[i].out ... V[i+1].in-1] stores the out-edges. This also means that each edge is stored twice.
Adjacency arrays are very fast and cache-efficient when traversed but slow when modified. They are often used for map data and route planning because map data doesn’t change that often.
Following this, there’s a few errors in the paper, for example:
- Inserting a new edge to a pre-existing node is O(number of edges), not O(1), because it needs to add a new entry to the middle of the edge array
- The paper claims “For instance, edges in an adjacency matrix should only be as big as the user-defined edge value, and for an adjacency array should be the size of user-defined edge value and in and out vertex references (vertex_id or pointer).”. Actually, an edge only stores only one vertex reference, not an in- and out-reference.