TY - GEN
T1 - Decomposable Architecture and Fault Mitigation Methodology for Deep Learning Accelerators
AU - Huang, Ning Chi
AU - Yang, Min Syue
AU - Chang, Ya Chu
AU - Wu, Kai Chiang
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - As the demand for data analysis increases rapidly, artificial intelligence (AI) models have been developed for various applications. Many deep neural networks are presented with millions or billions of parameters and operations for AI computation. Therefore, many AI accelerators apply pipelined architectures with simple but dense computational elements for numerous operations. However, manufacturing-induced faults cause a challenge to computational robustness or yield degradation on those AI accelerators. In this paper, we propose a fault mitigation methodology based on decomposable systolic arrays. By leveraging the inherent error resilience of AI applications, our data arrangement can reduce the difference between accurate results and faulty results. Additionally, utilizing both our proposed data arrangement and sign compensation can further mitigate the influence of faults in AI accelerators. In the experiments, our proposed fault mitigation methodology can maintain the application accuracy at a certain level, which outperforms state-of-the-art methods. When 0.1% of multiplier-accumulators are faulty in a systolic array, the array with our proposed fault mitigation methodology can have less than 0.5% accuracy loss while executing ResNet-18 for ImageNet classification.
AB - As the demand for data analysis increases rapidly, artificial intelligence (AI) models have been developed for various applications. Many deep neural networks are presented with millions or billions of parameters and operations for AI computation. Therefore, many AI accelerators apply pipelined architectures with simple but dense computational elements for numerous operations. However, manufacturing-induced faults cause a challenge to computational robustness or yield degradation on those AI accelerators. In this paper, we propose a fault mitigation methodology based on decomposable systolic arrays. By leveraging the inherent error resilience of AI applications, our data arrangement can reduce the difference between accurate results and faulty results. Additionally, utilizing both our proposed data arrangement and sign compensation can further mitigate the influence of faults in AI accelerators. In the experiments, our proposed fault mitigation methodology can maintain the application accuracy at a certain level, which outperforms state-of-the-art methods. When 0.1% of multiplier-accumulators are faulty in a systolic array, the array with our proposed fault mitigation methodology can have less than 0.5% accuracy loss while executing ResNet-18 for ImageNet classification.
KW - data arrangement
KW - decomposable systolic array
KW - deep learning accelerator
KW - fault mitigation
KW - sign compensation
UR - http://www.scopus.com/inward/record.url?scp=85161555085&partnerID=8YFLogxK
U2 - 10.1109/ISQED57927.2023.10129283
DO - 10.1109/ISQED57927.2023.10129283
M3 - Conference contribution
AN - SCOPUS:85161555085
T3 - Proceedings - International Symposium on Quality Electronic Design, ISQED
BT - Proceedings of the 24th International Symposium on Quality Electronic Design, ISQED 2023
PB - IEEE Computer Society
T2 - 24th International Symposium on Quality Electronic Design, ISQED 2023
Y2 - 5 April 2023 through 7 April 2023
ER -