TY - GEN
T1 - Alignment of deep features in 3D models for camera pose estimation
AU - Su, Jui Yuan
AU - Cheng, Shyi Chyi
AU - Chang, Chin Chun
AU - Hsieh, Jun-Wei
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Using a set of semantically annotated RGB-D images with known camera poses, many existing 3D reconstruction algorithms can integrate these images into a single 3D model of the scene. The semantically annotated scene model facilitates the construction of a video surveillance system using a moving camera if we can efficiently compute the depth maps of the captured images and estimate the poses of the camera. The proposed model-based video surveillance consists of two phases, i.e. the modeling phase and the inspection phase. In the modeling phase, we carefully calibrate the parameters of the camera that captures the multi-view video for modeling the target 3D scene. However, in the inspection phase, the camera pose parameters and the depth maps of the captured RGB images are often unknown or noisy when we use a moving camera to inspect the completeness of the object. In this paper, the 3D model is first transformed into a colored point cloud, which is then indexed by clustering—with each cluster representing a surface fragment of the scene. The clustering results are then used to train a model-specific convolution neural network (CNN) that annotates each pixel of an input RGB image with a correct fragment class. The prestored camera parameters and depth information of fragment classes are then fused together to estimate the depth map and the camera pose of the current input RGB image. The experimental results show that the proposed approach outperforms the compared methods in terms of the accuracy of camera pose estimation.
AB - Using a set of semantically annotated RGB-D images with known camera poses, many existing 3D reconstruction algorithms can integrate these images into a single 3D model of the scene. The semantically annotated scene model facilitates the construction of a video surveillance system using a moving camera if we can efficiently compute the depth maps of the captured images and estimate the poses of the camera. The proposed model-based video surveillance consists of two phases, i.e. the modeling phase and the inspection phase. In the modeling phase, we carefully calibrate the parameters of the camera that captures the multi-view video for modeling the target 3D scene. However, in the inspection phase, the camera pose parameters and the depth maps of the captured RGB images are often unknown or noisy when we use a moving camera to inspect the completeness of the object. In this paper, the 3D model is first transformed into a colored point cloud, which is then indexed by clustering—with each cluster representing a surface fragment of the scene. The clustering results are then used to train a model-specific convolution neural network (CNN) that annotates each pixel of an input RGB image with a correct fragment class. The prestored camera parameters and depth information of fragment classes are then fused together to estimate the depth map and the camera pose of the current input RGB image. The experimental results show that the proposed approach outperforms the compared methods in terms of the accuracy of camera pose estimation.
KW - 3D model
KW - 3D point cloud clustering
KW - Camera pose estimation
KW - Deep learning
KW - Unsupervised fragment classification
UR - http://www.scopus.com/inward/record.url?scp=85059847677&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-05716-9_36
DO - 10.1007/978-3-030-05716-9_36
M3 - Conference contribution
AN - SCOPUS:85059847677
SN - 9783030057152
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 440
EP - 452
BT - MultiMedia Modeling - 25th International Conference, MMM 2019, Proceedings
A2 - Huet, Benoit
A2 - Kompatsiaris, Ioannis
A2 - Vrochidis, Stefanos
A2 - Mezaris, Vasileios
A2 - Cheng, Wen-Huang
A2 - Gurrin, Cathal
PB - Springer Verlag
T2 - 25th International Conference on MultiMedia Modeling, MMM 2019
Y2 - 8 January 2019 through 11 January 2019
ER -