TY - JOUR
T1 - MVSNet++
T2 - Learning Depth-Based Attention Pyramid Features for Multi-View Stereo
AU - Chen, Po Heng
AU - Yang, Hsiao Chien
AU - Chen, Kuan-Wen
AU - Chen, Yong-Sheng
PY - 2020/6/12
Y1 - 2020/6/12
N2 - The goal of Multi-View Stereo (MVS) is to reconstruct 3D point-cloud model from multiple views. On the basis of the considerable progress of deep learning, an increasing amount of research has moved from traditional MVS methods to learning-based ones. However, two issues remain unsolved in the existing state-of-the-art methods: (1) only high-level information is considered for depth estimation. This may reduce the localization accuracy of 3D points as the learned model lacks spatial information; and (2) most of the methods require additional post-processing or network refinement to generate a smooth 3D model. This significantly increases the number of model parameters or the computational complexity. To this end, we propose MVSNet++, an end-to-end trainable network for dense depth estimation. Such an estimated depth map can further be applied to 3D model reconstruction. Different from previous methods, in the proposed method, we first adopt feature pyramid structures for both feature extraction and cost volume regularization. This can lead to accurate 3D point localization by fusing multi-level information. To generate smooth depth map, we then carefully integrate instance normalization into MVSNet++ without increasing model parameters and computational burden. Furthermore, we additionally design three loss functions and integrate Curriculum Learning framework into the training process, which can lead to an accurate reconstruction of 3D model. MVSNet++ is evaluated on DTU and Tanks Temples benchmarks with comprehensive ablation studies. Experimental results demonstrate that our proposed method performs favorably against previous state-of-the-art methods, showing the accuracy and effectiveness of the proposed MVSNet++.
AB - The goal of Multi-View Stereo (MVS) is to reconstruct 3D point-cloud model from multiple views. On the basis of the considerable progress of deep learning, an increasing amount of research has moved from traditional MVS methods to learning-based ones. However, two issues remain unsolved in the existing state-of-the-art methods: (1) only high-level information is considered for depth estimation. This may reduce the localization accuracy of 3D points as the learned model lacks spatial information; and (2) most of the methods require additional post-processing or network refinement to generate a smooth 3D model. This significantly increases the number of model parameters or the computational complexity. To this end, we propose MVSNet++, an end-to-end trainable network for dense depth estimation. Such an estimated depth map can further be applied to 3D model reconstruction. Different from previous methods, in the proposed method, we first adopt feature pyramid structures for both feature extraction and cost volume regularization. This can lead to accurate 3D point localization by fusing multi-level information. To generate smooth depth map, we then carefully integrate instance normalization into MVSNet++ without increasing model parameters and computational burden. Furthermore, we additionally design three loss functions and integrate Curriculum Learning framework into the training process, which can lead to an accurate reconstruction of 3D model. MVSNet++ is evaluated on DTU and Tanks Temples benchmarks with comprehensive ablation studies. Experimental results demonstrate that our proposed method performs favorably against previous state-of-the-art methods, showing the accuracy and effectiveness of the proposed MVSNet++.
KW - 3D model reconstruction
KW - deep learning
KW - feature aggregation
KW - Multi-view stereo
KW - plane sweep algorithm
UR - http://www.scopus.com/inward/record.url?scp=85088298572&partnerID=8YFLogxK
U2 - 10.1109/TIP.2020.3000611
DO - 10.1109/TIP.2020.3000611
M3 - Article
AN - SCOPUS:85088298572
VL - 29
SP - 7261
EP - 7273
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
SN - 1057-7149
M1 - 9115828
ER -