TY - GEN
T1 - Scheduling-Aware Prefetching
T2 - 10th IEEE Non-Volatile Memory Systems and Applications Symposium, NVMSA 2021
AU - Wang, Tse Yuan
AU - Wu, Chun Feng
AU - Tsao, Che Wei
AU - Chang, Yuan Hao
AU - Kuo, Tei Wei
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - The evolution of Cyber-Physical Systems (CPSs) and Internet of Things (IoTs) enables mobile and smart embedded devices to be equipped with embedded GPUs for accelerating data-intensive applications. To cut down device prices and reduce energy consumption, current GPUs adopt the unified memory architecture to extend memory size with using the PCIe SSD which is cheaper than directly enlarging the off-chip DRAM on the GPU. However, adopting the unified memory architecture, data shall be moved to the host DRAM before being moved to the off-chip DRAM and thus it leads to serious contention issues among CPUs and GPUs on the host DRAM. Although the advent of new communication technology provides the opportunity for GPUs to directly access the PCIe SSD without passing the host DRAM, it leads to high data movement costs because the latency gap between the off-chip DRAM and the PCIe SSD is large. To enhance the performance of the low-cost energy-efficient GPU memory systems, this work advocates a hardware-controller-based memory extension solution to not only avoid the contention issues on the host DRAM but also reduce the data movement costs. Particularly, we propose a scheduling-aware prefetching design to perform data prefetching by utilizing the information from the hardware warp scheduler. The proposed solution was evaluated by a series of intensive experiments and the results are encouraging.
AB - The evolution of Cyber-Physical Systems (CPSs) and Internet of Things (IoTs) enables mobile and smart embedded devices to be equipped with embedded GPUs for accelerating data-intensive applications. To cut down device prices and reduce energy consumption, current GPUs adopt the unified memory architecture to extend memory size with using the PCIe SSD which is cheaper than directly enlarging the off-chip DRAM on the GPU. However, adopting the unified memory architecture, data shall be moved to the host DRAM before being moved to the off-chip DRAM and thus it leads to serious contention issues among CPUs and GPUs on the host DRAM. Although the advent of new communication technology provides the opportunity for GPUs to directly access the PCIe SSD without passing the host DRAM, it leads to high data movement costs because the latency gap between the off-chip DRAM and the PCIe SSD is large. To enhance the performance of the low-cost energy-efficient GPU memory systems, this work advocates a hardware-controller-based memory extension solution to not only avoid the contention issues on the host DRAM but also reduce the data movement costs. Particularly, we propose a scheduling-aware prefetching design to perform data prefetching by utilizing the information from the hardware warp scheduler. The proposed solution was evaluated by a series of intensive experiments and the results are encouraging.
UR - http://www.scopus.com/inward/record.url?scp=85124033327&partnerID=8YFLogxK
U2 - 10.1109/NVMSA53655.2021.9628829
DO - 10.1109/NVMSA53655.2021.9628829
M3 - Conference contribution
AN - SCOPUS:85124033327
T3 - Proceedings - 10th IEEE Non-Volatile Memory Systems and Applications Symposium, NVMSA 2021
BT - Proceedings - 10th IEEE Non-Volatile Memory Systems and Applications Symposium, NVMSA 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 August 2021 through 19 August 2021
ER -