Mapping-Free GPU Offloading in OpenMP Using Unified Memory

Jia Sian Hong, Yi Ping You

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the increasing demand for heterogeneous computing, OpenMP has introduced an offloading feature that allows programmers to offload a task to a device (e.g., a GPU or an FPGA) by adding appropriate directives to the task since version 4.0. Compared to other low-level programming models, such as CUDA and OpenCL, OpenMP significantly reduces the burden on programmers to ensure that tasks are performed correctly on the device. However, OpenMP still has a data-mapping problem, which arises from the separate memory spaces between the host and the device. It is still necessary for programmers to specify data-mapping directives to indicate how data are transferred between the host and the device. When using complex data structures such as linked lists and graphs, it becomes more difficult to compose reliable and efficient data-mapping directives. Moreover, the OpenMP runtime library may incur substantial overhead due to data-mapping management. In this paper, we propose a compiler and runtime collaborative framework, called OpenMP-UM, to address the data-mapping problem. Using the CUDA unified memory mechanism, OpenMP-UM eliminates the need for data-mapping directives and reduces the overhead associated with data-mapping management. The key concept behind OpenMP-UM is to use unified memory as the default memory storage for all host data, including automatic, static, and dynamic data. Experiments have demonstrated that OpenMP-UM not only removed programmers' burden in writing data-mapping to offload in OpenMP applications but also achieved an average of 7.3x speedup for applications that involve deep copies and an average of 1.02x speedup for regular applications.

Original languageEnglish
Title of host publication52nd International Conference on Parallel Processing, ICPP 2023 - Workshops Proceedings
PublisherAssociation for Computing Machinery
Pages104-111
Number of pages8
ISBN (Electronic)9798400708435
DOIs
StatePublished - 7 Aug 2023
Event52nd International Conference on Parallel Processing, ICPP 2023 - Workshops Proceedings - Salt Lake City, United States
Duration: 7 Aug 202310 Aug 2023

Publication series

NameACM International Conference Proceeding Series

Conference

Conference52nd International Conference on Parallel Processing, ICPP 2023 - Workshops Proceedings
Country/TerritoryUnited States
CitySalt Lake City
Period7/08/2310/08/23

Keywords

  • CUDA
  • heterogeneous computing
  • OpenMP offloading
  • unified memory

Fingerprint

Dive into the research topics of 'Mapping-Free GPU Offloading in OpenMP Using Unified Memory'. Together they form a unique fingerprint.

Cite this