Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks

Tsung Tai Yeh, Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann, Timothy G. Rogers

研究成果: Conference contribution同行評審

21 引文 斯高帕斯(Scopus)

摘要

Massively multithreaded GPUs achieve high throughput by running thousands of threads in parallel. To fully utilize the hardware, workloads spawn work to the GPU in bulk by launching large tasks, where each task is a kernel that contains thousands of threads that occupy the entire GPU. GPUs face severe underutilization and their performance benefits vanish if the tasks are narrow, i.e., they contain < 500 threads. Latency-sensitive applications in network, signal, and image processing that generate a large number of tasks with relatively small inputs are examples of such limited parallelism. This paper presents Pagoda, a runtime system that virtualizes GPU resources, using an OS-like daemon kernel called MasterKernel. Tasks are spawned from the CPU onto Pagoda as they become available, and are scheduled by the MasterKernel at the warp granularity. Experimental results demonstrate that Pagoda achieves a geometric mean speedup of 5.70x over PThreads running on a 20-core CPU, 1.51x over CUDA-HyperQ, and 1.69x over GeMTC, the state-of- the-art runtime GPU task scheduling system.

原文English
主出版物標題PPoPP 2017 - Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
發行者Association for Computing Machinery
頁面221-234
頁數14
ISBN(電子)9781450344937
DOIs
出版狀態Published - 26 1月 2017
事件22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2017 - Austin, 美國
持續時間: 4 2月 20178 2月 2017

出版系列

名字Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Conference

Conference22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2017
國家/地區美國
城市Austin
期間4/02/178/02/17

指紋

深入研究「Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks」主題。共同形成了獨特的指紋。

引用此