Sparse matrix assembly on the GPU through multiplication patterns

R. Zayer; M. Steinberger; H. P. Seidel

doi:10.1109/HPEC.2017.8091057

Sparse matrix assembly on the GPU through multiplication patterns

R. Zayer, M. Steinberger, H. P. Seidel

Institute of Computer Graphics and Vision (7100)

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Abstract

The numerical treatment of variational problems gives rise to large sparse matrices, which are typically assembled by coalescing elementary contributions. As the explicit matrix form is required by numerical solvers, the assembly step can be a potential bottleneck, especially in implicit and time dependent settings where considerable updates are needed. On standard HPC platforms, this process can be vectorized by taking advantage of additional mesh querying data structures. However, on graphics hardware, vectorization is inhibited by limited memory resources. In this paper, we propose a lean unstructured mesh representation, which allows casting the assembly problem as a sparse matrix-matrix multiplication. We demonstrate how the global graph connectivity of the assembled matrix can be captured through basic linear algebra operations and show how local interactions between nodes/degrees of freedom within an element can be encoded by means of concise representation, action maps. These ideas not only reduce the memory storage requirements but also cut down on the bulk of data that needs to be moved from global storage to the compute units, which is crucial on parallel computing hardware, and in particular on the GPU. Furthermore, we analyze the effect of mesh memory layout on the assembly performance.

Original language	English
Title of host publication	2017 IEEE High Performance Extreme Computing Conference (HPEC)
Pages	1-8
Number of pages	8
DOIs	https://doi.org/10.1109/HPEC.2017.8091057
Publication status	Published - 1 Sept 2017
Event	2017 IEEE High Performance Extreme Computing Conference: HPEC 2017 - Westin Hotel, Waltham, United States Duration: 12 Sept 2017 → 14 Sept 2017 http://www.ieee-hpec.org/2017/

Conference

Conference	2017 IEEE High Performance Extreme Computing Conference
Abbreviated title	HPEC ‘17
Country/Territory	United States
City	Waltham
Period	12/09/17 → 14/09/17
Internet address	http://www.ieee-hpec.org/2017/

Keywords

data structures
graph theory
graphics processing units
linear algebra
mathematics computing
matrix multiplication
mesh generation
parallel processing
query processing
sparse matrices
GPU
assembled matrix
assembly performance
assembly problem
assembly step
basic linear algebra operations
elementary contributions
explicit matrix form
global graph connectivity
graphics hardware
lean unstructured mesh representation
memory storage requirements
mesh memory layout
mesh querying data structures
multiplication patterns
numerical solvers
numerical treatment
parallel computing hardware
sparse matrix assembly
sparse matrix-matrix multiplication
standard HPC platforms
variational problems
vectorization
Graphics processing units
Hardware
Matrices
Memory management
Sparse matrices
Standards

Access to Document

10.1109/HPEC.2017.8091057

Cite this

@inproceedings{1941f4d80d3e45b8b83fdc644c7fd117,

title = "Sparse matrix assembly on the GPU through multiplication patterns",

abstract = "The numerical treatment of variational problems gives rise to large sparse matrices, which are typically assembled by coalescing elementary contributions. As the explicit matrix form is required by numerical solvers, the assembly step can be a potential bottleneck, especially in implicit and time dependent settings where considerable updates are needed. On standard HPC platforms, this process can be vectorized by taking advantage of additional mesh querying data structures. However, on graphics hardware, vectorization is inhibited by limited memory resources. In this paper, we propose a lean unstructured mesh representation, which allows casting the assembly problem as a sparse matrix-matrix multiplication. We demonstrate how the global graph connectivity of the assembled matrix can be captured through basic linear algebra operations and show how local interactions between nodes/degrees of freedom within an element can be encoded by means of concise representation, action maps. These ideas not only reduce the memory storage requirements but also cut down on the bulk of data that needs to be moved from global storage to the compute units, which is crucial on parallel computing hardware, and in particular on the GPU. Furthermore, we analyze the effect of mesh memory layout on the assembly performance.",

keywords = "data structures, graph theory, graphics processing units, linear algebra, mathematics computing, matrix multiplication, mesh generation, parallel processing, query processing, sparse matrices, GPU, assembled matrix, assembly performance, assembly problem, assembly step, basic linear algebra operations, elementary contributions, explicit matrix form, global graph connectivity, graphics hardware, lean unstructured mesh representation, memory storage requirements, mesh memory layout, mesh querying data structures, multiplication patterns, numerical solvers, numerical treatment, parallel computing hardware, sparse matrix assembly, sparse matrix-matrix multiplication, standard HPC platforms, variational problems, vectorization, Graphics processing units, Hardware, Matrices, Memory management, Sparse matrices, Standards",

author = "R. Zayer and M. Steinberger and Seidel, {H. P.}",

year = "2017",

month = sep,

day = "1",

doi = "10.1109/HPEC.2017.8091057",

language = "English",

pages = "1--8",

booktitle = "2017 IEEE High Performance Extreme Computing Conference (HPEC)",

note = "2017 IEEE High Performance Extreme Computing Conference : HPEC 2017, HPEC {\textquoteleft}17 ; Conference date: 12-09-2017 Through 14-09-2017",

url = "http://www.ieee-hpec.org/2017/",

}

TY - GEN

T1 - Sparse matrix assembly on the GPU through multiplication patterns

AU - Zayer, R.

AU - Steinberger, M.

AU - Seidel, H. P.

PY - 2017/9/1

Y1 - 2017/9/1

N2 - The numerical treatment of variational problems gives rise to large sparse matrices, which are typically assembled by coalescing elementary contributions. As the explicit matrix form is required by numerical solvers, the assembly step can be a potential bottleneck, especially in implicit and time dependent settings where considerable updates are needed. On standard HPC platforms, this process can be vectorized by taking advantage of additional mesh querying data structures. However, on graphics hardware, vectorization is inhibited by limited memory resources. In this paper, we propose a lean unstructured mesh representation, which allows casting the assembly problem as a sparse matrix-matrix multiplication. We demonstrate how the global graph connectivity of the assembled matrix can be captured through basic linear algebra operations and show how local interactions between nodes/degrees of freedom within an element can be encoded by means of concise representation, action maps. These ideas not only reduce the memory storage requirements but also cut down on the bulk of data that needs to be moved from global storage to the compute units, which is crucial on parallel computing hardware, and in particular on the GPU. Furthermore, we analyze the effect of mesh memory layout on the assembly performance.

AB - The numerical treatment of variational problems gives rise to large sparse matrices, which are typically assembled by coalescing elementary contributions. As the explicit matrix form is required by numerical solvers, the assembly step can be a potential bottleneck, especially in implicit and time dependent settings where considerable updates are needed. On standard HPC platforms, this process can be vectorized by taking advantage of additional mesh querying data structures. However, on graphics hardware, vectorization is inhibited by limited memory resources. In this paper, we propose a lean unstructured mesh representation, which allows casting the assembly problem as a sparse matrix-matrix multiplication. We demonstrate how the global graph connectivity of the assembled matrix can be captured through basic linear algebra operations and show how local interactions between nodes/degrees of freedom within an element can be encoded by means of concise representation, action maps. These ideas not only reduce the memory storage requirements but also cut down on the bulk of data that needs to be moved from global storage to the compute units, which is crucial on parallel computing hardware, and in particular on the GPU. Furthermore, we analyze the effect of mesh memory layout on the assembly performance.

KW - data structures

KW - graph theory

KW - graphics processing units

KW - linear algebra

KW - mathematics computing

KW - matrix multiplication

KW - mesh generation

KW - parallel processing

KW - query processing

KW - sparse matrices

KW - GPU

KW - assembled matrix

KW - assembly performance

KW - assembly problem

KW - assembly step

KW - basic linear algebra operations

KW - elementary contributions

KW - explicit matrix form

KW - global graph connectivity

KW - graphics hardware

KW - lean unstructured mesh representation

KW - memory storage requirements

KW - mesh memory layout

KW - mesh querying data structures

KW - multiplication patterns

KW - numerical solvers

KW - numerical treatment

KW - parallel computing hardware

KW - sparse matrix assembly

KW - sparse matrix-matrix multiplication

KW - standard HPC platforms

KW - variational problems

KW - vectorization

KW - Graphics processing units

KW - Hardware

KW - Matrices

KW - Memory management

KW - Sparse matrices

KW - Standards

U2 - 10.1109/HPEC.2017.8091057

DO - 10.1109/HPEC.2017.8091057

M3 - Conference paper

SP - 1

EP - 8

BT - 2017 IEEE High Performance Extreme Computing Conference (HPEC)

T2 - 2017 IEEE High Performance Extreme Computing Conference

Y2 - 12 September 2017 through 14 September 2017

ER -

Sparse matrix assembly on the GPU through multiplication patterns

Abstract

Conference

Keywords

Access to Document

Fingerprint

Cite this