Monocular 3D Localization of Vehicles in Road Scenes

Haotian Zhang, Haorui Ji, Aotian Zheng, Jenq Neng Hwang, Ren Hung Hwang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Sensing and perception systems for autonomous driving vehicles in road scenes are composed of three crucial components: 3D-based object detection, tracking, and localization. While all three components are important, most relevant papers tend to only focus on one single component. We propose a monocular vision-based framework for 3D-based detection, tracking, and localization by effectively integrating all three tasks in a complementary manner. Our system contains an RCNN-based Localization Network (LOCNet), which works in concert with fitness evaluation score (FES) based single-frame optimization, to get more accurate and refined 3D vehicle localization. To better utilize the temporal information, we further use a multi-frame optimization technique, taking advantage of camera ego-motion and a 3D TrackletNet Tracker (3D TNT), to improve both accuracy and consistency in our 3D localization results. Our system outperforms state-of-the-art image-based solutions in diverse scenarios and is even comparable with LiDAR-based methods.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2855-2864
Number of pages10
ISBN (Electronic)9781665401913
DOIs
StatePublished - 2021
Event18th IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021 - Virtual, Online, Canada
Duration: 11 Oct 202117 Oct 2021

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2021-October
ISSN (Print)1550-5499

Conference

Conference18th IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021
Country/TerritoryCanada
CityVirtual, Online
Period11/10/2117/10/21

Fingerprint

Dive into the research topics of 'Monocular 3D Localization of Vehicles in Road Scenes'. Together they form a unique fingerprint.

Cite this