TY - GEN
T1 - Set utilization based dynamic shared cache partitioning
AU - Deayton, Peter
AU - Chung, Chung-Ping
PY - 2011
Y1 - 2011
N2 - As the number of processors sharing a cache increases, conflict misses due to interference amongst competing processes have an increasing impact on the individual performance of processes. Cache partitioning is a method of allocating a cache between concurrently executing processes in order to counteract the effects of inter-process conflicts. However, cache partitioning methods commonly divide a shared cache into private partitions dedicated to a single processor, which can lead to underutilized portions of the cache when set accesses are non-uniform. Our proposed method compliments these cache partitioning algorithms by creating an additional shared partition able to be shared amongst all processors. Underutilized areas of the cache are identified by a monitoring circuit and used for the shared partition. Detection of underutilization is based on the number of unique set accesses for a given allocated way. For a 16-way set associative cache, the implementation of our method requires 64 bytes of storage overhead per core in addition to that needed for the method that determines the sizes of the private partitions. For the tested system, our method is able to improve performance over the traditional LRU policy for a number of selected benchmark sets by an average of 1.4% and up to 13.3% for a two core system and an average of 1.4% and up to 7.8% for a four core system, and is able to improve the performance of a conventional cache partitioning method (Utility-Based Cache Partitioning) by an average of 0.1% and up to 0.5% for both a two and four core systems.
AB - As the number of processors sharing a cache increases, conflict misses due to interference amongst competing processes have an increasing impact on the individual performance of processes. Cache partitioning is a method of allocating a cache between concurrently executing processes in order to counteract the effects of inter-process conflicts. However, cache partitioning methods commonly divide a shared cache into private partitions dedicated to a single processor, which can lead to underutilized portions of the cache when set accesses are non-uniform. Our proposed method compliments these cache partitioning algorithms by creating an additional shared partition able to be shared amongst all processors. Underutilized areas of the cache are identified by a monitoring circuit and used for the shared partition. Detection of underutilization is based on the number of unique set accesses for a given allocated way. For a 16-way set associative cache, the implementation of our method requires 64 bytes of storage overhead per core in addition to that needed for the method that determines the sizes of the private partitions. For the tested system, our method is able to improve performance over the traditional LRU policy for a number of selected benchmark sets by an average of 1.4% and up to 13.3% for a two core system and an average of 1.4% and up to 7.8% for a four core system, and is able to improve the performance of a conventional cache partitioning method (Utility-Based Cache Partitioning) by an average of 0.1% and up to 0.5% for both a two and four core systems.
KW - Cache partitioning
KW - Chip multi-processor
KW - Set utilization
KW - Shared cache
UR - http://www.scopus.com/inward/record.url?scp=84856609309&partnerID=8YFLogxK
U2 - 10.1109/ICPADS.2011.119
DO - 10.1109/ICPADS.2011.119
M3 - Conference contribution
AN - SCOPUS:84856609309
SN - 9780769545769
T3 - Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
SP - 284
EP - 291
BT - Proceedings - 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011
T2 - 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011
Y2 - 7 December 2011 through 9 December 2011
ER -