TY - GEN
T1 - Mapping-Free GPU Offloading in OpenMP Using Unified Memory
AU - Hong, Jia Sian
AU - You, Yi Ping
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/8/7
Y1 - 2023/8/7
N2 - With the increasing demand for heterogeneous computing, OpenMP has introduced an offloading feature that allows programmers to offload a task to a device (e.g., a GPU or an FPGA) by adding appropriate directives to the task since version 4.0. Compared to other low-level programming models, such as CUDA and OpenCL, OpenMP significantly reduces the burden on programmers to ensure that tasks are performed correctly on the device. However, OpenMP still has a data-mapping problem, which arises from the separate memory spaces between the host and the device. It is still necessary for programmers to specify data-mapping directives to indicate how data are transferred between the host and the device. When using complex data structures such as linked lists and graphs, it becomes more difficult to compose reliable and efficient data-mapping directives. Moreover, the OpenMP runtime library may incur substantial overhead due to data-mapping management. In this paper, we propose a compiler and runtime collaborative framework, called OpenMP-UM, to address the data-mapping problem. Using the CUDA unified memory mechanism, OpenMP-UM eliminates the need for data-mapping directives and reduces the overhead associated with data-mapping management. The key concept behind OpenMP-UM is to use unified memory as the default memory storage for all host data, including automatic, static, and dynamic data. Experiments have demonstrated that OpenMP-UM not only removed programmers' burden in writing data-mapping to offload in OpenMP applications but also achieved an average of 7.3x speedup for applications that involve deep copies and an average of 1.02x speedup for regular applications.
AB - With the increasing demand for heterogeneous computing, OpenMP has introduced an offloading feature that allows programmers to offload a task to a device (e.g., a GPU or an FPGA) by adding appropriate directives to the task since version 4.0. Compared to other low-level programming models, such as CUDA and OpenCL, OpenMP significantly reduces the burden on programmers to ensure that tasks are performed correctly on the device. However, OpenMP still has a data-mapping problem, which arises from the separate memory spaces between the host and the device. It is still necessary for programmers to specify data-mapping directives to indicate how data are transferred between the host and the device. When using complex data structures such as linked lists and graphs, it becomes more difficult to compose reliable and efficient data-mapping directives. Moreover, the OpenMP runtime library may incur substantial overhead due to data-mapping management. In this paper, we propose a compiler and runtime collaborative framework, called OpenMP-UM, to address the data-mapping problem. Using the CUDA unified memory mechanism, OpenMP-UM eliminates the need for data-mapping directives and reduces the overhead associated with data-mapping management. The key concept behind OpenMP-UM is to use unified memory as the default memory storage for all host data, including automatic, static, and dynamic data. Experiments have demonstrated that OpenMP-UM not only removed programmers' burden in writing data-mapping to offload in OpenMP applications but also achieved an average of 7.3x speedup for applications that involve deep copies and an average of 1.02x speedup for regular applications.
KW - CUDA
KW - heterogeneous computing
KW - OpenMP offloading
KW - unified memory
UR - http://www.scopus.com/inward/record.url?scp=85175042507&partnerID=8YFLogxK
U2 - 10.1145/3605731.3605907
DO - 10.1145/3605731.3605907
M3 - Conference contribution
AN - SCOPUS:85175042507
T3 - ACM International Conference Proceeding Series
SP - 104
EP - 111
BT - 52nd International Conference on Parallel Processing, ICPP 2023 - Workshops Proceedings
PB - Association for Computing Machinery
T2 - 52nd International Conference on Parallel Processing, ICPP 2023 - Workshops Proceedings
Y2 - 7 August 2023 through 10 August 2023
ER -