TY - GEN
T1 - Performance-driven architectural synthesis for distributed register-file microarchitecture considering inter-island delay
AU - Huang, Juinn-Dar
AU - Chen, Chia I.
AU - Hsu, Wan Ling
AU - Lin, Yen Ting
AU - Jou, Jing Yang
PY - 2010/11/8
Y1 - 2010/11/8
N2 - In deep-submicron era, wire delay is becoming the bottleneck while pursuing high system clock speed. Several distributed register (DR) architectures are proposed to cope with this problem by keeping most wires local. In this paper, we propose the distributed register-file microarchitecture with inter-island delay (DRFM-IID). With such delay consideration, synthesis task is inherently more complicated than the one with no inter-island delay concern since uncertain interconnect latency is very likely to make a serious impact on whole system performance. Hence we also develop a performance-driven architectural synthesis framework targeting DRFM-IID, which takes the number of inter-island transfers, transfer criticality and resource utilization into account for better optimization outcomes. The experimental results show that the latency and the number of inter-cluster transfers can be reduced by 26.9% and 37.5% on average; and the latter is a common indicator for power consumption of on-chip communication.
AB - In deep-submicron era, wire delay is becoming the bottleneck while pursuing high system clock speed. Several distributed register (DR) architectures are proposed to cope with this problem by keeping most wires local. In this paper, we propose the distributed register-file microarchitecture with inter-island delay (DRFM-IID). With such delay consideration, synthesis task is inherently more complicated than the one with no inter-island delay concern since uncertain interconnect latency is very likely to make a serious impact on whole system performance. Hence we also develop a performance-driven architectural synthesis framework targeting DRFM-IID, which takes the number of inter-island transfers, transfer criticality and resource utilization into account for better optimization outcomes. The experimental results show that the latency and the number of inter-cluster transfers can be reduced by 26.9% and 37.5% on average; and the latter is a common indicator for power consumption of on-chip communication.
UR - http://www.scopus.com/inward/record.url?scp=78049381072&partnerID=8YFLogxK
U2 - 10.1109/VDAT.2010.5496717
DO - 10.1109/VDAT.2010.5496717
M3 - Conference contribution
AN - SCOPUS:78049381072
SN - 9781424452712
T3 - Proceedings of 2010 International Symposium on VLSI Design, Automation and Test, VLSI-DAT 2010
SP - 169
EP - 172
BT - Proceedings of 2010 International Symposium on VLSI Design, Automation and Test, VLSI-DAT 2010
T2 - 2010 International Symposium on VLSI Design, Automation and Test, VLSI-DAT 2010
Y2 - 26 April 2010 through 29 April 2010
ER -