TY - GEN
T1 - Cross-layer dynamic prefetching allocation strategies for high-performance multicores
AU - Peng, Yin Chi
AU - Chen, Chien Chih
AU - Chang, Chia Jung
AU - Chen, Tien-Fu
AU - Yew, Pen Chung
PY - 2013
Y1 - 2013
N2 - For the last decade, there have been varying techniques for hardware prefetchers to improve the system performance. However, due to limited space and bandwidth in a multicore system, the prefetching data fetched by prefetcher may pollute L1 cache even though the data is useful, thus resulting into significant performance degradation. Most contemporary multicore systems simply disable prefetching to avoid unexpected contention. This paper proposes a cross-layer and dynamic Prefetch Allocation Management (PAM) to provide better caching strategies in a parallel environment. Our approach has two main mechanisms, targeting at the different prefetch degree and location choices to minimize the cache pollution and contention. Across a variety of SPLASH2 and PARSEC benchmark, our PAM approach can contribute up to 12% of performance improvement on a 4-core multicore system compared to the static prefetcher configuration and also saves 9.1% of the memory bandwidth consumption of memory system.
AB - For the last decade, there have been varying techniques for hardware prefetchers to improve the system performance. However, due to limited space and bandwidth in a multicore system, the prefetching data fetched by prefetcher may pollute L1 cache even though the data is useful, thus resulting into significant performance degradation. Most contemporary multicore systems simply disable prefetching to avoid unexpected contention. This paper proposes a cross-layer and dynamic Prefetch Allocation Management (PAM) to provide better caching strategies in a parallel environment. Our approach has two main mechanisms, targeting at the different prefetch degree and location choices to minimize the cache pollution and contention. Across a variety of SPLASH2 and PARSEC benchmark, our PAM approach can contribute up to 12% of performance improvement on a 4-core multicore system compared to the static prefetcher configuration and also saves 9.1% of the memory bandwidth consumption of memory system.
UR - http://www.scopus.com/inward/record.url?scp=84881361051&partnerID=8YFLogxK
U2 - 10.1109/VLDI-DAT.2013.6533864
DO - 10.1109/VLDI-DAT.2013.6533864
M3 - Conference contribution
AN - SCOPUS:84881361051
SN - 9781467344357
T3 - 2013 International Symposium on VLSI Design, Automation, and Test, VLSI-DAT 2013
BT - 2013 International Symposium on VLSI Design, Automation, and Test, VLSI-DAT 2013
T2 - 2013 International Symposium on VLSI Design, Automation, and Test, VLSI-DAT 2013
Y2 - 22 April 2013 through 24 April 2013
ER -