Language-Guided Negative Sample Mining for Open-Vocabulary Object Detection

Yu Wen Tseng, Hong Han Shuai, Ching Chun Huang, Yung Hui Li, Wen Huang Cheng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the domain of computer vision, object detection serves as a fundamental perceptual task with critical implications. Traditional object detection frameworks are limited by their inability to recognize object classes not present in their training datasets, a significant drawback for practical applications where encountering novel objects is commonplace. To address the inherent lack of adaptability, more sophisticated paradigms such as zero-shot and open-vocabulary object detection have been introduced. Open-vocabulary object detection, in particular, often necessitates auxiliary image-text paired data to enhance model training. Our research proposes an innovative approach that refines the training process by mining potential unlabeled objects from negative sample pools. Leveraging a large-scale vision-language model, we harness the entropy of classification scores to selectively identify and annotate previously unlabeled samples, subsequently incorporating them into the training regimen. This novel methodology empowers our model to attain competitive performance benchmarks on the challenging MSCOCO dataset, matching state-of-the-art outcomes, while obviating the need for additional data or supplementary training procedures.

Original languageEnglish
Title of host publication2024 International Conference on Electronics, Information, and Communication, ICEIC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350371888
DOIs
StatePublished - 2024
Event2024 International Conference on Electronics, Information, and Communication, ICEIC 2024 - Taipei, Taiwan
Duration: 28 Jan 202431 Jan 2024

Publication series

Name2024 International Conference on Electronics, Information, and Communication, ICEIC 2024

Conference

Conference2024 International Conference on Electronics, Information, and Communication, ICEIC 2024
Country/TerritoryTaiwan
CityTaipei
Period28/01/2431/01/24

Keywords

  • Open-vocabulary detection
  • large vision-language model
  • negative sample mining

Fingerprint

Dive into the research topics of 'Language-Guided Negative Sample Mining for Open-Vocabulary Object Detection'. Together they form a unique fingerprint.

Cite this