TY - GEN
T1 - Multi-level parallelism analysis of face detection on a shared memory multi-core system
AU - Chiang, Chih Hsuan
AU - Kao, Chih Heng
AU - Li, Guan Ru
AU - Lai, Bo-Cheng
PY - 2011
Y1 - 2011
N2 - Face detection is one of the fundamental technologies for the future smart objects. However, its computation intensive property thwarts the practice of a real-time application on an embedded device. Parallel processing and many-core architecture have become a mainstream to achieve high performance in the future computing systems. The parallelism of an application needs to be exposed before being exploited by the parallel architecture. This paper performs a comprehensive analysis of the parallelism of a face detection algorithm at different algorithmic levels. This paper has demonstrated that each parallelism level has its own potential to enhance performance, but also imposes different limiting factors to the overall performance. Based on the analysis results and design experience, this paper proposes a multi-staged mixed-level parallelization scheme to retain the performance scalability and avoid the limiting factors. With this scheme, we are able to achieve up to 37.5x performance enhancement on a 64-core system.
AB - Face detection is one of the fundamental technologies for the future smart objects. However, its computation intensive property thwarts the practice of a real-time application on an embedded device. Parallel processing and many-core architecture have become a mainstream to achieve high performance in the future computing systems. The parallelism of an application needs to be exposed before being exploited by the parallel architecture. This paper performs a comprehensive analysis of the parallelism of a face detection algorithm at different algorithmic levels. This paper has demonstrated that each parallelism level has its own potential to enhance performance, but also imposes different limiting factors to the overall performance. Based on the analysis results and design experience, this paper proposes a multi-staged mixed-level parallelization scheme to retain the performance scalability and avoid the limiting factors. With this scheme, we are able to achieve up to 37.5x performance enhancement on a 64-core system.
UR - http://www.scopus.com/inward/record.url?scp=79959503447&partnerID=8YFLogxK
U2 - 10.1109/VDAT.2011.5783540
DO - 10.1109/VDAT.2011.5783540
M3 - Conference contribution
AN - SCOPUS:79959503447
SN - 9781424484997
T3 - Proceedings of 2011 International Symposium on VLSI Design, Automation and Test, VLSI-DAT 2011
SP - 328
EP - 331
BT - Proceedings of 2011 International Symposium on VLSI Design, Automation and Test, VLSI-DAT 2011
T2 - 2011 International Symposium on VLSI Design, Automation and Test, VLSI-DAT 2011
Y2 - 25 April 2011 through 28 April 2011
ER -