Globally Homogeneous, Locally Adaptive Sparse Matrix-vector Multiplication on the GPU

Markus Steinberger, Rhaleb Zayer, Hans-Peter Seidel

Publikation: Beitrag in Buch/Bericht/KonferenzbandBeitrag in einem KonferenzbandBegutachtung

Abstract

The rising popularity of the graphics processing unit (GPU) across various numerical computing applications triggered a breakneck race to optimize key numerical kernels and in particular, the sparse matrix-vector product (SpMV). Despite great strides, most existing GPU-SpMV approaches trade off one aspect of performance against another. They either require preprocessing, exhibit inconsistent behavior, lead to execution divergence, suffer load imbalance or induce detrimental memory access patterns. In this paper, we present an uncompromising approach for SpMV on the GPU. Our approach requires no separate preprocessing or knowledge of the matrix structure and works directly on the standard compressed sparse rows (CSR) data format. From a global perspective, it exhibits a homogeneous behavior reflected in efficient memory access patterns and steady per-thread workload. From a local perspective, it avoids heterogeneous execution paths by adapting its behavior to the work load at hand, it uses an efficient encoding to keep temporary data requirements for on-chip memory low, and leads to divergence-free execution. We evaluate our approach on more than 2500 matrices comparing to vendor provided, and state-of-the-art SpMV implementations. Our approach not only significantly outperforms approaches directly operating on the CSR format ( 20% average performance increase), but also outperforms approaches that preprocess the matrix even when preprocessing time is discarded. Additionally, the same strategies lead to significant performance increase when adapted for transpose SpMV.
Originalspracheenglisch
TitelICS '17: Proceedings of the International Conference on Supercomputing
ErscheinungsortNew York, NY, USA
Herausgeber (Verlag)ACM SIGWEB
Seiten1–11
ISBN (Print)978-1-4503-5020-4
DOIs
PublikationsstatusVeröffentlicht - 2017
Extern publiziertJa
VeranstaltungInternational Conference on Supercomputing: ICS 2017 - Chicago, USA / Vereinigte Staaten
Dauer: 14 Juni 201716 Juni 2017

Konferenz

KonferenzInternational Conference on Supercomputing
KurztitelICS '17
Land/GebietUSA / Vereinigte Staaten
OrtChicago
Zeitraum14/06/1716/06/17

Fingerprint

Untersuchen Sie die Forschungsthemen von „Globally Homogeneous, Locally Adaptive Sparse Matrix-vector Multiplication on the GPU“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren