Sparse matrix assembly on the GPU through multiplication patterns

R. Zayer, M. Steinberger, H. P. Seidel

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review


The numerical treatment of variational problems gives rise to large sparse matrices, which are typically assembled by coalescing elementary contributions. As the explicit matrix form is required by numerical solvers, the assembly step can be a potential bottleneck, especially in implicit and time dependent settings where considerable updates are needed. On standard HPC platforms, this process can be vectorized by taking advantage of additional mesh querying data structures. However, on graphics hardware, vectorization is inhibited by limited memory resources. In this paper, we propose a lean unstructured mesh representation, which allows casting the assembly problem as a sparse matrix-matrix multiplication. We demonstrate how the global graph connectivity of the assembled matrix can be captured through basic linear algebra operations and show how local interactions between nodes/degrees of freedom within an element can be encoded by means of concise representation, action maps. These ideas not only reduce the memory storage requirements but also cut down on the bulk of data that needs to be moved from global storage to the compute units, which is crucial on parallel computing hardware, and in particular on the GPU. Furthermore, we analyze the effect of mesh memory layout on the assembly performance.
Original languageEnglish
Title of host publication2017 IEEE High Performance Extreme Computing Conference (HPEC)
Number of pages8
Publication statusPublished - 1 Sept 2017
Event2017 IEEE High Performance Extreme Computing Conference: HPEC 2017 - Westin Hotel, Waltham, United States
Duration: 12 Sept 201714 Sept 2017


Conference2017 IEEE High Performance Extreme Computing Conference
Abbreviated titleHPEC ‘17
Country/TerritoryUnited States
Internet address


  • data structures
  • graph theory
  • graphics processing units
  • linear algebra
  • mathematics computing
  • matrix multiplication
  • mesh generation
  • parallel processing
  • query processing
  • sparse matrices
  • GPU
  • assembled matrix
  • assembly performance
  • assembly problem
  • assembly step
  • basic linear algebra operations
  • elementary contributions
  • explicit matrix form
  • global graph connectivity
  • graphics hardware
  • lean unstructured mesh representation
  • memory storage requirements
  • mesh memory layout
  • mesh querying data structures
  • multiplication patterns
  • numerical solvers
  • numerical treatment
  • parallel computing hardware
  • sparse matrix assembly
  • sparse matrix-matrix multiplication
  • standard HPC platforms
  • variational problems
  • vectorization
  • Graphics processing units
  • Hardware
  • Matrices
  • Memory management
  • Sparse matrices
  • Standards

Cite this