Matrix algebra framework for portable, scalable and efficient query engines for RDF graphs
Abstract
Existing query engines for RDF graphs follow one of two design paradigms: relational or graph-based. We explore sparse matrix algebra as a third paradigm and propose MAGiQ: a framework for implementing SPARQL query engines that are portable on various hardware architectures, scalable over thousands of compute nodes, and efficient for very large RDF datasets. MAGiQ represents the RDF graph as a sparse matrix and defines a domain-specific language of algebraic operations. SPARQL queries are translated into matrix algebra programs that are oblivious to the underlying computing infrastructure. Existing matrix algebra libraries, optimized for each particular architecture, are called to execute the program and handle the performance issues. We present three case studies of matrix algebra back-end libraries: SuiteSparse, Matlab, and CombBLAS; we demonstrate how MAGiQ can effortlessly be ported on a variety of architectures such as Intel CPUs, NVIDIA GPUs, and Cray XC40 supercomputers. Our experiments on large-scale real and synthetic datasets show that MAGiQ performs comparably to or better than existing specialized SPARQL query engines for data-intensive queries, scales to very large computing infrastructures, and handles datasets with up to 512 billion triples.