TY - GEN
T1 - IUML: Inception U-Net Based Multi-Task Learning for Density Level Classification And Crowd Density Estimation
AU - Huynh, Van Su
AU - Tran, Vu Hoang
AU - Huang, Ching-Chun
PY - 2019/10/6
Y1 - 2019/10/6
N2 - Nowadays, image-based people counting is an essential technique for public safety management. However, this work is still extremely challenging due to many kinds of scale issues caused by different congested scenes, different viewing points, different image sizes, and different density levels. In this paper, we proposed a CNNs-based framework for people counting and crowd density map estimation with the consideration of the scale problems. First, we introduced an encoder-decoder architecture, which is composed of Inception modules to learn the multi-scale feature representations. Besides, to be adaptive to image resolution, a multi-loss setting over different resolutions of density maps is designed for network training. Second, we apply multi-task learning to learn the joint features for the density map estimation task and the density level classification task. This helps to enhance the feature generality under different scenes. Finally, by adopting the U-net architecture, the encoder and decoder features are then fused to generate high-resolution density maps. The efficacy of the proposed method is evaluated in the extensive experiments by quantifying the counting performance through multiple evaluation criteria.
AB - Nowadays, image-based people counting is an essential technique for public safety management. However, this work is still extremely challenging due to many kinds of scale issues caused by different congested scenes, different viewing points, different image sizes, and different density levels. In this paper, we proposed a CNNs-based framework for people counting and crowd density map estimation with the consideration of the scale problems. First, we introduced an encoder-decoder architecture, which is composed of Inception modules to learn the multi-scale feature representations. Besides, to be adaptive to image resolution, a multi-loss setting over different resolutions of density maps is designed for network training. Second, we apply multi-task learning to learn the joint features for the density map estimation task and the density level classification task. This helps to enhance the feature generality under different scenes. Finally, by adopting the U-net architecture, the encoder and decoder features are then fused to generate high-resolution density maps. The efficacy of the proposed method is evaluated in the extensive experiments by quantifying the counting performance through multiple evaluation criteria.
KW - Crowd counting
KW - Deep Learning
KW - Density level classification
KW - Inception module
KW - Multi-task learning
KW - Estimation , Task analysis , Feature extraction , Decoding , Image resolution , Training , Linear programming
UR - http://www.scopus.com/inward/record.url?scp=85076740271&partnerID=8YFLogxK
U2 - 10.1109/SMC.2019.8914497
DO - 10.1109/SMC.2019.8914497
M3 - Conference contribution
AN - SCOPUS:85076740271
T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
SP - 3019
EP - 3024
BT - 2019 IEEE International Conference on Systems, Man and Cybernetics, SMC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Systems, Man and Cybernetics, SMC 2019
Y2 - 6 October 2019 through 9 October 2019
ER -