CMAF: Cross-Modal Augmentation via Fusion for Underwater Acoustic Image Recognition

Shih Wei Yang*, Li Hsiang Shen, Hong Han Shuai*, Kai Ten Feng*

*此作品的通信作者

研究成果: Article同行評審

2 引文 斯高帕斯(Scopus)

摘要

Underwater image recognition is crucial for underwater detection applications. Fish classification has been one of the emerging research areas in recent years. Existing image classification models usually classify data collected from terrestrial environments. However, existing image classification models trained with terrestrial data are unsuitable for underwater images, as identifying underwater data is challenging due to their incomplete and noisy features. To address this, we propose a cross-modal augmentation via fusion (CMAF) framework for acoustic-based fish image classification. Our approach involves separating the process into two branches: visual modality and sonar signal modality, where the latter provides a complementary character feature. We augment the visual modality, design an attention-based fusion module, and adopt a masking-based training strategy with a mask-based focal loss to improve the learning of local features and address the class imbalance problem. Our proposed method outperforms the state-of-the-art methods. Our source code is available at https://github.com/WilkinsYang/CMAF.

原文English
文章編號124
期刊ACM Transactions on Multimedia Computing, Communications and Applications
20
發行號5
DOIs
出版狀態Published - 11 1月 2024

指紋

深入研究「CMAF: Cross-Modal Augmentation via Fusion for Underwater Acoustic Image Recognition」主題。共同形成了獨特的指紋。

引用此