TY - GEN
T1 - ES3Net
T2 - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023
AU - Fang, I. Sheng
AU - Wen, Hsiao Chieh
AU - Hsu, Chia Lun
AU - Jen, Po Chung
AU - Chen, Ping Yang
AU - Chen, Yong Sheng
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Efficient and accurate depth estimation is crucial for real-world embedded vision applications, such as autonomous driving, 3D reconstruction, and drone navigation. Stereo matching is considered more accurate than monocular depth estimation due to the presence of a reference image, but its computational inefficiency poses a challenge for its deployment on edge devices. Moreover, it is difficult to acquire ground-truth depths for supervised training of stereo matching networks. To address these challenges, we propose Edge-based Self-Supervised Stereo matching Network (ES3Net), which efficiently estimates accurate depths without ground-truth depths for training. We introduce dual disparity to transform an efficient supervised stereo matching network into a self-supervised learning framework. Comprehensive experimental results demonstrate that ES3Net has comparable accuracy with stereo methods while outperforming monocular methods in inference time, approaching state-of-the-art performance. More specifically, our method improves over 40% in terms of RMSElog, compared to monocular methods while having 1500 times fewer parameters and running four times faster on NVIDIA Jetson TX2. The efficient and reliable estimation of depths on edge devices using ES3Net lays a good foundation for safe drone navigation.
AB - Efficient and accurate depth estimation is crucial for real-world embedded vision applications, such as autonomous driving, 3D reconstruction, and drone navigation. Stereo matching is considered more accurate than monocular depth estimation due to the presence of a reference image, but its computational inefficiency poses a challenge for its deployment on edge devices. Moreover, it is difficult to acquire ground-truth depths for supervised training of stereo matching networks. To address these challenges, we propose Edge-based Self-Supervised Stereo matching Network (ES3Net), which efficiently estimates accurate depths without ground-truth depths for training. We introduce dual disparity to transform an efficient supervised stereo matching network into a self-supervised learning framework. Comprehensive experimental results demonstrate that ES3Net has comparable accuracy with stereo methods while outperforming monocular methods in inference time, approaching state-of-the-art performance. More specifically, our method improves over 40% in terms of RMSElog, compared to monocular methods while having 1500 times fewer parameters and running four times faster on NVIDIA Jetson TX2. The efficient and reliable estimation of depths on edge devices using ES3Net lays a good foundation for safe drone navigation.
UR - http://www.scopus.com/inward/record.url?scp=85170829952&partnerID=8YFLogxK
U2 - 10.1109/CVPRW59228.2023.00470
DO - 10.1109/CVPRW59228.2023.00470
M3 - Conference contribution
AN - SCOPUS:85170829952
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 4472
EP - 4481
BT - Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023
PB - IEEE Computer Society
Y2 - 18 June 2023 through 22 June 2023
ER -