TY - GEN
T1 - Monocular 3D Localization of Vehicles in Road Scenes
AU - Zhang, Haotian
AU - Ji, Haorui
AU - Zheng, Aotian
AU - Hwang, Jenq Neng
AU - Hwang, Ren Hung
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Sensing and perception systems for autonomous driving vehicles in road scenes are composed of three crucial components: 3D-based object detection, tracking, and localization. While all three components are important, most relevant papers tend to only focus on one single component. We propose a monocular vision-based framework for 3D-based detection, tracking, and localization by effectively integrating all three tasks in a complementary manner. Our system contains an RCNN-based Localization Network (LOCNet), which works in concert with fitness evaluation score (FES) based single-frame optimization, to get more accurate and refined 3D vehicle localization. To better utilize the temporal information, we further use a multi-frame optimization technique, taking advantage of camera ego-motion and a 3D TrackletNet Tracker (3D TNT), to improve both accuracy and consistency in our 3D localization results. Our system outperforms state-of-the-art image-based solutions in diverse scenarios and is even comparable with LiDAR-based methods.
AB - Sensing and perception systems for autonomous driving vehicles in road scenes are composed of three crucial components: 3D-based object detection, tracking, and localization. While all three components are important, most relevant papers tend to only focus on one single component. We propose a monocular vision-based framework for 3D-based detection, tracking, and localization by effectively integrating all three tasks in a complementary manner. Our system contains an RCNN-based Localization Network (LOCNet), which works in concert with fitness evaluation score (FES) based single-frame optimization, to get more accurate and refined 3D vehicle localization. To better utilize the temporal information, we further use a multi-frame optimization technique, taking advantage of camera ego-motion and a 3D TrackletNet Tracker (3D TNT), to improve both accuracy and consistency in our 3D localization results. Our system outperforms state-of-the-art image-based solutions in diverse scenarios and is even comparable with LiDAR-based methods.
UR - http://www.scopus.com/inward/record.url?scp=85123053642&partnerID=8YFLogxK
U2 - 10.1109/ICCVW54120.2021.00320
DO - 10.1109/ICCVW54120.2021.00320
M3 - Conference contribution
AN - SCOPUS:85123053642
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 2855
EP - 2864
BT - Proceedings - 2021 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021
Y2 - 11 October 2021 through 17 October 2021
ER -