Abstract
Depth estimation from a monocular 360◦ image is an emerging problem that gains popularity due to the availability of consumer-level 360◦ cameras and the complete surrounding sensing capability. While the standard of 360◦ imaging is under rapid development, we propose to predict the depth map of a monocular 360◦ image by mimicking both peripheral and foveal vision of the human eye. To this end, we adopt a two-branch neural network leveraging two common projections: equirectangular and cubemap projections. In particular, equirectangular projection incorporates a complete field-of-view but introduces distortion, whereas cubemap projection avoids distortion but introduces discontinuity at the boundary of the cube. Thus we propose a bi-projection fusion scheme along with learnable masks to balance the feature map from the two projections. Moreover, for the cubemap projection, we propose a spherical padding procedure which mitigates discontinuity at the boundary of each face. We apply our method to four panorama datasets and show favorable results against the existing state-of-the-art methods.
| Original language | English |
|---|---|
| Article number | 9157424 |
| Pages (from-to) | 459-468 |
| Number of pages | 10 |
| Journal | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
| DOIs | |
| State | Published - 2020 |
| Event | 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States Duration: 14 Jun 2020 → 19 Jun 2020 |