Multimodal Prompting with Missing Modalities for Visual Recognition

Yi Lun Lee, Yi Hsuan Tsai, Wei Chen Chiu, Chen Yu Lee

研究成果: Conference contribution同行評審

9 引文 斯高帕斯(Scopus)

摘要

In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models. To this end, we propose to utilize prompt learning and mitigate the above two challenges together. Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model. We further explore the effect of different prompt configurations and analyze the robustness to missing modality. Extensive experiments are conducted to show the effectiveness of our prompt learning framework that improves the performance under various missing-modality cases, while alleviating the requirement of heavy model retraining. Code is available.11https://github.com/YiLunLee/missing-aware-prompts

原文English
主出版物標題Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
發行者IEEE Computer Society
頁面14943-14952
頁數10
ISBN(電子)9798350301298
DOIs
出版狀態Published - 2023
事件2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 - Vancouver, Canada
持續時間: 18 6月 202322 6月 2023

出版系列

名字Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
2023-June
ISSN(列印)1063-6919

Conference

Conference2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
國家/地區Canada
城市Vancouver
期間18/06/2322/06/23

指紋

深入研究「Multimodal Prompting with Missing Modalities for Visual Recognition」主題。共同形成了獨特的指紋。

引用此