Incorporating Domain Knowledge into Language Transformers for Multi-Label Classification of Chinese Medical Questions

Po Han Chen, Yu Xiang Zeng, Lung Hao Lee

研究成果: Conference contribution同行評審

1 引文 斯高帕斯(Scopus)

摘要

In this paper, we propose a knowledge infusion mechanism to incorporate domain knowledge into language transformers. Weakly supervised data is regarded as the main source for knowledge acquisition. We pre-train the language models to capture masked knowledge of focuses and aspects and then fine-tune them to obtain better performance on the downstream tasks. Due to the lack of publicly available datasets for multi-label classification of Chinese medical questions, we crawled questions from medical question/answer forums and manually annotated them using eight predefined classes: persons and organizations, symptom, cause, examination, disease, information, ingredient, and treatment. Finally, a total of 1,814 questions with 2,340 labels. Each question contains an average of 1.29 labels. We used Baidu Medical Encyclopedia as the knowledge resource. Two transformers BERT and RoBERTa were implemented to compare performance on our constructed datasets. Experimental results showed that our proposed model with knowledge infusion mechanism can achieve better performance, no matter which evaluation metric including Macro F1, Micro F1, Weighted F1 or Subset Accuracy were considered.

原文English
主出版物標題ROCLING 2021 - Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing
編輯Lung-Hao Lee, Chia-Hui Chang, Kuan-Yu Chen
發行者The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
頁面265-270
頁數6
ISBN(電子)9789869576949
出版狀態Published - 2021
事件33rd Conference on Computational Linguistics and Speech Processing, ROCLING 2021 - Taoyuan, 台灣
持續時間: 15 10月 202116 10月 2021

出版系列

名字ROCLING 2021 - Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing

Conference

Conference33rd Conference on Computational Linguistics and Speech Processing, ROCLING 2021
國家/地區台灣
城市Taoyuan
期間15/10/2116/10/21

指紋

深入研究「Incorporating Domain Knowledge into Language Transformers for Multi-Label Classification of Chinese Medical Questions」主題。共同形成了獨特的指紋。

引用此