Fine-Grained Memory Profiling of GPGPU Kernels

Max von Buelow, Stefan Guthe, Dieter W. Fellner

Research output: Contribution to journalArticlepeer-review

Abstract

Memory performance is a crucial bottleneck in many GPGPU applications, making optimizations for hardware and software mandatory. While hardware vendors already use highly efficient caching architectures, software engineers usually have to organize their data accordingly in order to efficiently make use of these, requiring deep knowledge of the actual hardware. In this paper we present a novel technique for fine-grained memory profiling that simulates the whole pipeline of memory flow and finally accumulates profiling values in a way that the user retains information about the potential region in the GPU program by showing these values separately for each allocation. Our memory simulator turns out to outperform state-of-the-art memory models of NVIDIA architectures by a magnitude of 2.4 for the L1 cache and 1.3 for the L2 cache, in terms of accuracy. Additionally, we find our technique of fine grained memory profiling a useful tool for memory optimizations, which we successfully show in case of ray tracing and machine learning applications.

Original languageEnglish
Pages (from-to)227-235
Number of pages9
JournalComputer Graphics Forum
Volume41
Issue number7
DOIs
Publication statusPublished - Oct 2022

Keywords

  • CCS Concepts
  • • Computing methodologies → Graphics processors
  • • Hardware → Simulation and emulation
  • • Theory of computation → Program analysis

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'Fine-Grained Memory Profiling of GPGPU Kernels'. Together they form a unique fingerprint.

Cite this