TY - JOUR
T1 - Multi-fusion feature pyramid for real-time hand detection
AU - Chang, Chuan Wang
AU - Santra, Santanu
AU - Hsieh, Jun Wei
AU - Hendri, Pirdiansyah
AU - Lin, Chi Fang
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2022
Y1 - 2022
N2 - Real-time HI (Human Interface) systems need accurate and efficient hand detection models to meet the limited resources in budget, dimension, memory, computing, and electric power. The detection task is also important for other applications such as homecare systems, fine-grained action recognition, movie interpretation, and even for understanding dance gestures. In recent years, object detection has become a less challenging task with the latest deep CNN-based state-of-the-art models, i.e., RCNN, SSD, and YOLO. However, these models cannot achieve desired efficiency and accuracy on HI-based embedded devices due to their complex time-consuming architecture. Another critical issue in hand detection is that small hands (<30 × 30 pixels) are still challenging for all the above methods. We proposed a shallow model named Multi-fusion Feature Pyramid for real-time hand detection to deal with the above problems. Experimental results on the Oxford hand dataset combined with the skin dataset show that the proposed method outperforms other SoTA methods in terms of accuracy, efficiency, and real-time speed. The COCO dataset is also used to compare with other state-of-the-art method and shows the highest efficiency and accuracy with the proposed CFPN model. Thus we conclude that the proposed model is useful for real-life small hand detection on embedded devices.
AB - Real-time HI (Human Interface) systems need accurate and efficient hand detection models to meet the limited resources in budget, dimension, memory, computing, and electric power. The detection task is also important for other applications such as homecare systems, fine-grained action recognition, movie interpretation, and even for understanding dance gestures. In recent years, object detection has become a less challenging task with the latest deep CNN-based state-of-the-art models, i.e., RCNN, SSD, and YOLO. However, these models cannot achieve desired efficiency and accuracy on HI-based embedded devices due to their complex time-consuming architecture. Another critical issue in hand detection is that small hands (<30 × 30 pixels) are still challenging for all the above methods. We proposed a shallow model named Multi-fusion Feature Pyramid for real-time hand detection to deal with the above problems. Experimental results on the Oxford hand dataset combined with the skin dataset show that the proposed method outperforms other SoTA methods in terms of accuracy, efficiency, and real-time speed. The COCO dataset is also used to compare with other state-of-the-art method and shows the highest efficiency and accuracy with the proposed CFPN model. Thus we conclude that the proposed model is useful for real-life small hand detection on embedded devices.
KW - Embedded system
KW - Hand detection
KW - Human
KW - Object detection
KW - YOLOV4
UR - http://www.scopus.com/inward/record.url?scp=85125523185&partnerID=8YFLogxK
U2 - 10.1007/s11042-021-11897-7
DO - 10.1007/s11042-021-11897-7
M3 - Article
AN - SCOPUS:85125523185
SN - 1380-7501
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
ER -