Fine-Grained Memory Profiling of GPGPU Kernels

Max von Buelow; Stefan Guthe; Dieter W. Fellner

doi:10.1111/cgf.14671

Fine-Grained Memory Profiling of GPGPU Kernels

Max von Buelow, Stefan Guthe, Dieter W. Fellner

Institute of Computer Graphics and Knowledge Visualisation (7110)

Research output: Contribution to journal › Article › peer-review

Abstract

Memory performance is a crucial bottleneck in many GPGPU applications, making optimizations for hardware and software mandatory. While hardware vendors already use highly efficient caching architectures, software engineers usually have to organize their data accordingly in order to efficiently make use of these, requiring deep knowledge of the actual hardware. In this paper we present a novel technique for fine-grained memory profiling that simulates the whole pipeline of memory flow and finally accumulates profiling values in a way that the user retains information about the potential region in the GPU program by showing these values separately for each allocation. Our memory simulator turns out to outperform state-of-the-art memory models of NVIDIA architectures by a magnitude of 2.4 for the L1 cache and 1.3 for the L2 cache, in terms of accuracy. Additionally, we find our technique of fine grained memory profiling a useful tool for memory optimizations, which we successfully show in case of ray tracing and machine learning applications.

Original language	English
Pages (from-to)	227-235
Number of pages	9
Journal	Computer Graphics Forum
Volume	41
Issue number	7
DOIs	https://doi.org/10.1111/cgf.14671
Publication status	Published - Oct 2022

Keywords

CCS Concepts
• Computing methodologies → Graphics processors
• Hardware → Simulation and emulation
• Theory of computation → Program analysis

ASJC Scopus subject areas

Computer Graphics and Computer-Aided Design

Access to Document

10.1111/cgf.14671Licence: CC BY 4.0

Cite this

@article{d1e5db52bbe44549b7cf5ae0b0b49025,

title = "Fine-Grained Memory Profiling of GPGPU Kernels",

abstract = "Memory performance is a crucial bottleneck in many GPGPU applications, making optimizations for hardware and software mandatory. While hardware vendors already use highly efficient caching architectures, software engineers usually have to organize their data accordingly in order to efficiently make use of these, requiring deep knowledge of the actual hardware. In this paper we present a novel technique for fine-grained memory profiling that simulates the whole pipeline of memory flow and finally accumulates profiling values in a way that the user retains information about the potential region in the GPU program by showing these values separately for each allocation. Our memory simulator turns out to outperform state-of-the-art memory models of NVIDIA architectures by a magnitude of 2.4 for the L1 cache and 1.3 for the L2 cache, in terms of accuracy. Additionally, we find our technique of fine grained memory profiling a useful tool for memory optimizations, which we successfully show in case of ray tracing and machine learning applications.",

keywords = "CCS Concepts, • Computing methodologies → Graphics processors, • Hardware → Simulation and emulation, • Theory of computation → Program analysis",

author = "Buelow, {Max von} and Stefan Guthe and Fellner, {Dieter W.}",

note = "Publisher Copyright: {\textcopyright} 2023 The Authors. Computer Graphics Forum published by Eurographics - The European Association for Computer Graphics and John Wiley & Sons Ltd.",

year = "2022",

month = oct,

doi = "10.1111/cgf.14671",

language = "English",

volume = "41",

pages = "227--235",

journal = "Computer Graphics Forum",

issn = "0167-7055",

publisher = "Wiley-Blackwell",

number = "7",

}

TY - JOUR

T1 - Fine-Grained Memory Profiling of GPGPU Kernels

AU - Buelow, Max von

AU - Guthe, Stefan

AU - Fellner, Dieter W.

PY - 2022/10

Y1 - 2022/10

N2 - Memory performance is a crucial bottleneck in many GPGPU applications, making optimizations for hardware and software mandatory. While hardware vendors already use highly efficient caching architectures, software engineers usually have to organize their data accordingly in order to efficiently make use of these, requiring deep knowledge of the actual hardware. In this paper we present a novel technique for fine-grained memory profiling that simulates the whole pipeline of memory flow and finally accumulates profiling values in a way that the user retains information about the potential region in the GPU program by showing these values separately for each allocation. Our memory simulator turns out to outperform state-of-the-art memory models of NVIDIA architectures by a magnitude of 2.4 for the L1 cache and 1.3 for the L2 cache, in terms of accuracy. Additionally, we find our technique of fine grained memory profiling a useful tool for memory optimizations, which we successfully show in case of ray tracing and machine learning applications.

AB - Memory performance is a crucial bottleneck in many GPGPU applications, making optimizations for hardware and software mandatory. While hardware vendors already use highly efficient caching architectures, software engineers usually have to organize their data accordingly in order to efficiently make use of these, requiring deep knowledge of the actual hardware. In this paper we present a novel technique for fine-grained memory profiling that simulates the whole pipeline of memory flow and finally accumulates profiling values in a way that the user retains information about the potential region in the GPU program by showing these values separately for each allocation. Our memory simulator turns out to outperform state-of-the-art memory models of NVIDIA architectures by a magnitude of 2.4 for the L1 cache and 1.3 for the L2 cache, in terms of accuracy. Additionally, we find our technique of fine grained memory profiling a useful tool for memory optimizations, which we successfully show in case of ray tracing and machine learning applications.

KW - CCS Concepts

KW - • Computing methodologies → Graphics processors

KW - • Hardware → Simulation and emulation

KW - • Theory of computation → Program analysis

UR - http://www.scopus.com/inward/record.url?scp=85150680183&partnerID=8YFLogxK

U2 - 10.1111/cgf.14671

DO - 10.1111/cgf.14671

M3 - Article

AN - SCOPUS:85150680183

SN - 0167-7055

VL - 41

SP - 227

EP - 235

JO - Computer Graphics Forum

JF - Computer Graphics Forum

IS - 7

ER -

Fine-Grained Memory Profiling of GPGPU Kernels

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this