Deadline-Aware Offloading for High-Throughput Accelerators

Tsung Tai Yeh, Matthew D. Sinclair, Bradford M. Beckmann, Timothy G. Rogers

研究成果: Conference contribution同行評審

11 引文 斯高帕斯(Scopus)


Contemporary GPUs are widely used for throughput-oriented data-parallel workloads and increasingly are being considered for latency-sensitive applications in datacenters. Examples include recurrent neural network (RNN) inference, network packet processing, and intelligent personal assistants. These data parallel applications have both high throughput demands and real-Time deadlines (40μs-7ms). Moreover, the kernels in these applications have relatively few threads that do not fully utilize the device unless a large batch size is used. However, batching forces jobs to wait, which increases their latency, especially when realistic job arrival times are considered.Previously, programmers have managed the tradeoffs associated with concurrent, latency-sensitive jobs by using a combination of GPU streams and advanced scheduling algorithms running on the CPU host. Although GPU streams allow the accelerator to execute multiple jobs concurrently, prior state-of-The-Art solutions use the relatively distant CPU host to prioritize the latency-sensitive GPU tasks. Thus, these approaches are forced to operate at a coarse granularity and cannot quickly adapt to rapidly changing program behavior.We observe that fine-grain, device-integrated kernel schedulers efficiently meet the deadlines of concurrent, latency-sensitive GPU jobs. To overcome the limitations of software-only, CPU-side approaches, we extend the GPU queue scheduler to manage real-Time deadlines. We propose a novel laxity-Aware scheduler (LAX) that uses information collected within the GPU to dynamically vary job priority based on how much laxity jobs have before their deadline. Compared to contemporary GPUs, 3 state-of-The-Art CPU-side schedulers and 6 other advanced GPU-side schedulers, LAX meets the deadlines of 1.7X-5.0X more jobs and provides better energy-efficiency, throughput, and 99-percentile tail latency.

主出版物標題Proceeding - 27th IEEE International Symposium on High Performance Computer Architecture, HPCA 2021
發行者IEEE Computer Society
出版狀態Published - 2月 2021
事件27th Annual IEEE International Symposium on High Performance Computer Architecture, HPCA 2021 - Virtual, Seoul, Korea, Republic of
持續時間: 27 2月 20211 3月 2021


名字Proceedings - International Symposium on High-Performance Computer Architecture


Conference27th Annual IEEE International Symposium on High Performance Computer Architecture, HPCA 2021
國家/地區Korea, Republic of
城市Virtual, Seoul


深入研究「Deadline-Aware Offloading for High-Throughput Accelerators」主題。共同形成了獨特的指紋。