MENTOR: Multilingual Text Detection Toward Learning by Analogy

Hsin Ju Lin, Tsu Chun Chung, Ching Chun Hsiao, Pin Yu Chen, Wei Chen Chiu, Ching Chun Huang

研究成果: Conference contribution同行評審

摘要

Text detection is frequently used in vision-based mobile robots when they need to interpret texts in their surroundings to perform a given task. For instance, delivery robots in multilingual cities need to be capable of doing multilingual text detection so that the robots can read traffic signs and road markings. Moreover, the target languages change from region to region, implying the need of efficiently re-training the models to recognize the novel/new languages. However, collecting and labeling training data for novel languages are cumbersome, and the efforts to re-train an existing/trained text detector are considerable. Even worse, such a routine would repeat whenever a novel language appears. This motivates us to propose a new problem setting for tackling the aforementioned challenges in a more efficient way: 'We ask for a generalizable multilingual text detection framework to detect and identify both seen and unseen language regions inside scene images without the requirement of collecting supervised training data for unseen languages as well as model re-training'. To this end, we propose 'MENTOR', the first work to realize a learning strategy between zero-shot learning and few-shot learning for multilingual scene text detection. During the training phase, we leverage the 'zero-cost' synthesized printed texts and the available training/seen languages to learn the meta-mapping from printed texts to language-specific kernel weights. Meanwhile, dynamic convolution networks guided by the language-specific kernel are trained to realize a detection-by-feature-matching scheme. In the inference phase, 'zero-cost' printed texts are synthesized given a new target language. By utilizing the learned meta-mapping and the matching network, our 'MENTOR' can freely identify the text regions of the new language. Experiments show our model can achieve comparable results with supervised methods for seen languages and outperform other methods in detecting unseen languages.

原文English
主出版物標題2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
發行者Institute of Electrical and Electronics Engineers Inc.
頁面3248-3255
頁數8
ISBN(電子)9781665491907
DOIs
出版狀態Published - 2023
事件2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023 - Detroit, 美國
持續時間: 1 10月 20235 10月 2023

出版系列

名字IEEE International Conference on Intelligent Robots and Systems
ISSN(列印)2153-0858
ISSN(電子)2153-0866

Conference

Conference2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
國家/地區美國
城市Detroit
期間1/10/235/10/23

指紋

深入研究「MENTOR: Multilingual Text Detection Toward Learning by Analogy」主題。共同形成了獨特的指紋。

引用此