TY - GEN
T1 - MCLB
T2 - 37th IEEE International System-on-Chip Conference, SOCC 2024
AU - Geraeinejad, Vahid
AU - Chen, Kun Chih Jimmy
AU - Lu, Zhonghai
AU - Ebrahimi, Masoumeh
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Graphics Processing Units (GPUs) play a pivotal role as primary devices for executing a diverse range of applications. Effective load balancing of the interconnection network is crucial in distributed computing systems as it ensures optimal resource utilization. While previous studies have addressed interconnection network load balancing, our investigation reveals that GPU cores often exhibit a uniform load pattern due to the nature of their workloads. However, we found that memory controllers experience varying loads, potentially leading to stall cycles during which memory requests cannot enter a specific controller's full queue, causing it to remain in the interconnection network. Introducing the concept of 'busy' and 'relaxed' memory controllers, our proposed method, Memory Controller Load Balancing (MCLB), dynamically balances the load on memory controllers by categorizing them based on a predefined threshold. GPU cores temporarily pause sending memory requests to 'busy' memory controllers, prioritizing 'relaxed' cores. This strategy effectively reduces unnecessary congestion in the interconnection network and improves resource utilization in the memory request path. To our knowledge, MCLB is the first method specifically designed to balance memory controller loads in GPU. MCLB significantly reduces total number of memory controller stalls (eliminating them completely in some cases), resulting in latency enhancements. It improves memory request and response roundtrip latency by up to 11.8%, and interconnection network latency by up to 24.6%. This work presents a novel approach to GPU optimization by addressing memory controller load imbalances.
AB - Graphics Processing Units (GPUs) play a pivotal role as primary devices for executing a diverse range of applications. Effective load balancing of the interconnection network is crucial in distributed computing systems as it ensures optimal resource utilization. While previous studies have addressed interconnection network load balancing, our investigation reveals that GPU cores often exhibit a uniform load pattern due to the nature of their workloads. However, we found that memory controllers experience varying loads, potentially leading to stall cycles during which memory requests cannot enter a specific controller's full queue, causing it to remain in the interconnection network. Introducing the concept of 'busy' and 'relaxed' memory controllers, our proposed method, Memory Controller Load Balancing (MCLB), dynamically balances the load on memory controllers by categorizing them based on a predefined threshold. GPU cores temporarily pause sending memory requests to 'busy' memory controllers, prioritizing 'relaxed' cores. This strategy effectively reduces unnecessary congestion in the interconnection network and improves resource utilization in the memory request path. To our knowledge, MCLB is the first method specifically designed to balance memory controller loads in GPU. MCLB significantly reduces total number of memory controller stalls (eliminating them completely in some cases), resulting in latency enhancements. It improves memory request and response roundtrip latency by up to 11.8%, and interconnection network latency by up to 24.6%. This work presents a novel approach to GPU optimization by addressing memory controller load imbalances.
KW - GPGPU
KW - interconnection network
KW - latency
KW - load balancing
KW - memory controller
KW - stall
UR - http://www.scopus.com/inward/record.url?scp=85210570211&partnerID=8YFLogxK
U2 - 10.1109/SOCC62300.2024.10737822
DO - 10.1109/SOCC62300.2024.10737822
M3 - Conference contribution
AN - SCOPUS:85210570211
T3 - International System on Chip Conference
BT - Proceedings - 2024 IEEE 37th International System-on-Chip Conference, SOCC 2024
A2 - Gohringer, Diana
A2 - Gabler, Uwe
A2 - Harbaum, Tanja
A2 - Hofmann, Klaus
PB - IEEE Computer Society
Y2 - 16 September 2024 through 19 September 2024
ER -