TY - GEN
T1 - Thread affinity mapping for irregular data access on shared cache GPGPU
AU - Kuo, Hsien Kai
AU - Chen, Kuan Ting
AU - Lai, Bo-Cheng
AU - Jou, Jing Yang
PY - 2012
Y1 - 2012
N2 - Memory Coalescing and on-chip shared Cache are two effective techniques to alleviate the memory bottleneck in modern GPGPUs. These two techniques are very useful on applications with regular memory accesses. However, they become ineffective on concurrent threads with large numbers of uncoordinated accesses and the potential performance benefit could be significantly degraded. This paper proposes a thread affinity mapping methodology to coordinate the irregular data accesses on shared cache GPGPUs. Based on the proposed affinity metrics, threads are congregated into execution groups which are able to fully exploit the memory coalescing and data sharing within an application. An average of 3.5x runtime speedup is achieved on a Fermi GPGPU. The speedup scales with the sizes of test cases, which makes the proposed methodology an effective and promising solution for the continually increasing complexities of applications in the future many-core systems.
AB - Memory Coalescing and on-chip shared Cache are two effective techniques to alleviate the memory bottleneck in modern GPGPUs. These two techniques are very useful on applications with regular memory accesses. However, they become ineffective on concurrent threads with large numbers of uncoordinated accesses and the potential performance benefit could be significantly degraded. This paper proposes a thread affinity mapping methodology to coordinate the irregular data accesses on shared cache GPGPUs. Based on the proposed affinity metrics, threads are congregated into execution groups which are able to fully exploit the memory coalescing and data sharing within an application. An average of 3.5x runtime speedup is achieved on a Fermi GPGPU. The speedup scales with the sizes of test cases, which makes the proposed methodology an effective and promising solution for the continually increasing complexities of applications in the future many-core systems.
UR - http://www.scopus.com/inward/record.url?scp=84860003663&partnerID=8YFLogxK
U2 - 10.1109/ASPDAC.2012.6165038
DO - 10.1109/ASPDAC.2012.6165038
M3 - Conference contribution
AN - SCOPUS:84860003663
SN - 9781467307727
T3 - Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC
SP - 659
EP - 664
BT - ASP-DAC 2012 - 17th Asia and South Pacific Design Automation Conference
T2 - 17th Asia and South Pacific Design Automation Conference, ASP-DAC 2012
Y2 - 30 January 2012 through 2 February 2012
ER -