TY - GEN
T1 - Sparse matrix assembly on the GPU through multiplication patterns
AU - Zayer, R.
AU - Steinberger, M.
AU - Seidel, H. P.
PY - 2017/9/1
Y1 - 2017/9/1
N2 - The numerical treatment of variational problems gives rise to large sparse matrices, which are typically assembled by coalescing elementary contributions. As the explicit matrix form is required by numerical solvers, the assembly step can be a potential bottleneck, especially in implicit and time dependent settings where considerable updates are needed. On standard HPC platforms, this process can be vectorized by taking advantage of additional mesh querying data structures. However, on graphics hardware, vectorization is inhibited by limited memory resources. In this paper, we propose a lean unstructured mesh representation, which allows casting the assembly problem as a sparse matrix-matrix multiplication. We demonstrate how the global graph connectivity of the assembled matrix can be captured through basic linear algebra operations and show how local interactions between nodes/degrees of freedom within an element can be encoded by means of concise representation, action maps. These ideas not only reduce the memory storage requirements but also cut down on the bulk of data that needs to be moved from global storage to the compute units, which is crucial on parallel computing hardware, and in particular on the GPU. Furthermore, we analyze the effect of mesh memory layout on the assembly performance.
AB - The numerical treatment of variational problems gives rise to large sparse matrices, which are typically assembled by coalescing elementary contributions. As the explicit matrix form is required by numerical solvers, the assembly step can be a potential bottleneck, especially in implicit and time dependent settings where considerable updates are needed. On standard HPC platforms, this process can be vectorized by taking advantage of additional mesh querying data structures. However, on graphics hardware, vectorization is inhibited by limited memory resources. In this paper, we propose a lean unstructured mesh representation, which allows casting the assembly problem as a sparse matrix-matrix multiplication. We demonstrate how the global graph connectivity of the assembled matrix can be captured through basic linear algebra operations and show how local interactions between nodes/degrees of freedom within an element can be encoded by means of concise representation, action maps. These ideas not only reduce the memory storage requirements but also cut down on the bulk of data that needs to be moved from global storage to the compute units, which is crucial on parallel computing hardware, and in particular on the GPU. Furthermore, we analyze the effect of mesh memory layout on the assembly performance.
KW - data structures
KW - graph theory
KW - graphics processing units
KW - linear algebra
KW - mathematics computing
KW - matrix multiplication
KW - mesh generation
KW - parallel processing
KW - query processing
KW - sparse matrices
KW - GPU
KW - assembled matrix
KW - assembly performance
KW - assembly problem
KW - assembly step
KW - basic linear algebra operations
KW - elementary contributions
KW - explicit matrix form
KW - global graph connectivity
KW - graphics hardware
KW - lean unstructured mesh representation
KW - memory storage requirements
KW - mesh memory layout
KW - mesh querying data structures
KW - multiplication patterns
KW - numerical solvers
KW - numerical treatment
KW - parallel computing hardware
KW - sparse matrix assembly
KW - sparse matrix-matrix multiplication
KW - standard HPC platforms
KW - variational problems
KW - vectorization
KW - Graphics processing units
KW - Hardware
KW - Matrices
KW - Memory management
KW - Sparse matrices
KW - Standards
U2 - 10.1109/HPEC.2017.8091057
DO - 10.1109/HPEC.2017.8091057
M3 - Conference paper
SP - 1
EP - 8
BT - 2017 IEEE High Performance Extreme Computing Conference (HPEC)
T2 - 2017 IEEE High Performance Extreme Computing Conference
Y2 - 12 September 2017 through 14 September 2017
ER -