Real-time HI (Human Interface) systems need accurate and efficient hand detection models to meet the limited resources in budget, dimension, memory, computing, and electric power. The detection task is also important for other applications such as homecare systems, fine-grained action recognition, movie interpretation, and even for understanding dance gestures. In recent years, object detection has become a less challenging task with the latest deep CNN-based state-of-the-art models, i.e., RCNN, SSD, and YOLO. However, these models cannot achieve desired efficiency and accuracy on HI-based embedded devices due to their complex time-consuming architecture. Another critical issue in hand detection is that small hands (<30 × 30 pixels) are still challenging for all the above methods. We proposed a shallow model named Multi-fusion Feature Pyramid for real-time hand detection to deal with the above problems. Experimental results on the Oxford hand dataset combined with the skin dataset show that the proposed method outperforms other SoTA methods in terms of accuracy, efficiency, and real-time speed. The COCO dataset is also used to compare with other state-of-the-art method and shows the highest efficiency and accuracy with the proposed CFPN model. Thus we conclude that the proposed model is useful for real-life small hand detection on embedded devices.