TY - GEN
T1 - Named Entity Recognition for Chinese Healthcare Applications
AU - Lee, Cheng Yen
AU - Su, Ming Hsiang
AU - Pleva, Matus
AU - Hládek, Daniel
AU - Liao, Yuan Fu
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Named Entity Recognition is a fundamental task in information extraction, which locates and classifies defined named entities in unstructured text. Chinese NER is more difficult than English NER. Since there are no separators between Chinese characters, incorrectly segmented entity boundaries will cause error propagation in NER. In this study, named entity recognition is constructed and applied in the Chinese medical domain, where Chinese medical datasets are labeled in BIO format. The Chinese HealthNER Corpus contains 33,89 sentences, of which 2531 sentences are divided into the validation set and 3204 sentences are divided into the test set. This study uses PyTorch Embedding + BiLSTM + CRF, RoBERTa + BiLSTM + CRF, BERT Classifier, and BERT + BiLSTM + CRF for training and compares their model performance. Finally, the BERT + BiLSTM + CRF achieves the best prediction performance with a precision of 91.30%, recall of 89.46%, and F1 score of 90.53%
AB - Named Entity Recognition is a fundamental task in information extraction, which locates and classifies defined named entities in unstructured text. Chinese NER is more difficult than English NER. Since there are no separators between Chinese characters, incorrectly segmented entity boundaries will cause error propagation in NER. In this study, named entity recognition is constructed and applied in the Chinese medical domain, where Chinese medical datasets are labeled in BIO format. The Chinese HealthNER Corpus contains 33,89 sentences, of which 2531 sentences are divided into the validation set and 3204 sentences are divided into the test set. This study uses PyTorch Embedding + BiLSTM + CRF, RoBERTa + BiLSTM + CRF, BERT Classifier, and BERT + BiLSTM + CRF for training and compares their model performance. Finally, the BERT + BiLSTM + CRF achieves the best prediction performance with a precision of 91.30%, recall of 89.46%, and F1 score of 90.53%
KW - Chinese language processing
KW - NLP
KW - medical domain
KW - named entity recognition
KW - text corpus
UR - http://www.scopus.com/inward/record.url?scp=85174902530&partnerID=8YFLogxK
U2 - 10.1109/ICCE-Taiwan58799.2023.10226876
DO - 10.1109/ICCE-Taiwan58799.2023.10226876
M3 - Conference contribution
AN - SCOPUS:85174902530
T3 - 2023 International Conference on Consumer Electronics - Taiwan, ICCE-Taiwan 2023 - Proceedings
SP - 749
EP - 750
BT - 2023 International Conference on Consumer Electronics - Taiwan, ICCE-Taiwan 2023 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 International Conference on Consumer Electronics - Taiwan, ICCE-Taiwan 2023
Y2 - 17 July 2023 through 19 July 2023
ER -