TY - JOUR
T1 - Evaluating a Natural Language Processing–Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment
T2 - Algorithm Development and Validation Study
AU - Dai, Hong Jie
AU - Wang, Chen Kai
AU - Chen, Chien Chang
AU - Liou, Chong Sin
AU - Lu, An Tai
AU - Lai, Chia Hsin
AU - Shain, Bo Tsz
AU - Ke, Cheng Rong
AU - Chung Wang, William Yu
AU - Mir, Tatheer Hussain
AU - Simanjuntak, Mutiara
AU - Kao, Hao Yun
AU - Tsai, Ming Ju
AU - Tseng, Vincent S.
N1 - Publisher Copyright:
©Hong-Jie Dai, Chen-Kai Wang, Chien-Chang Chen, Chong-Sin Liou, An-Tai Lu, Chia-Hsin Lai, Bo-Tsz Shain, Cheng-Rong Ke, William Yu Chung Wang, Tatheer Hussain Mir, Mutiara Simanjuntak, Hao-Yun Kao, Ming-Ju Tsai, Vincent S Tseng.
PY - 2024
Y1 - 2024
N2 - Background: International Classification of Diseases codes are widely used to describe diagnosis information, but manual coding relies heavily on human interpretation, which can be expensive, time consuming, and prone to errors. With the transition from the International Classification of Diseases, Ninth Revision, to the International Classification of Diseases, Tenth Revision (ICD-10), the coding process has become more complex, highlighting the need for automated approaches to enhance coding efficiency and accuracy. Inaccurate coding can result in substantial financial losses for hospitals, and a precise assessment of outcomes generated by a natural language processing (NLP)–driven autocoding system thus assumes a critical role in safeguarding the accuracy of the Taiwan diagnosis related groups (Tw-DRGs). Objective: This study aims to evaluate the feasibility of applying an International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM), autocoding system that can automatically determine diagnoses and codes based on free-text discharge summaries to facilitate the assessment of Tw-DRGs, specifically principal diagnosis and major diagnostic categories (MDCs). Methods: By using the patient discharge summaries from Kaohsiung Medical University Chung-Ho Memorial Hospital (KMUCHH) from April 2019 to December 2020 as a reference data set we developed artificial intelligence (AI)–assisted ICD-10-CM coding systems based on deep learning models. We constructed a web-based user interface for the AI-assisted coding system and deployed the system to the workflow of the certified coding specialists (CCSs) of KMUCHH. The data used for the assessment of Tw-DRGs were manually curated by a CCS with the principal diagnosis and MDC was determined from discharge summaries collected at KMUCHH from February 2023 to April 2023. Results: Both the reference data set and real hospital data were used to assess performance in determining ICD-10-CM coding, principal diagnosis, and MDC for Tw-DRGs. Among all methods, the GPT-2 (OpenAI)-based model achieved the highest F1-score, 0.667 (F1-score 0.851 for the top 50 codes), on the KMUCHH test set and a slightly lower F1-score, 0.621, in real hospital data. Cohen κ evaluation for the agreement of MDC between the models and the CCS revealed that the overall average κ value for GPT-2 (κ=0.714) was approximately 12.2 percentage points higher than that of the hierarchy attention network (κ=0.592). GPT-2 demonstrated superior agreement with the CCS across 6 categories of MDC, with an average κ value of approximately 0.869 (SD 0.033), underscoring the effectiveness of the developed AI-assisted coding system in supporting the work of CCSs. Conclusions: An NLP-driven AI-assisted coding system can assist CCSs in ICD-10-CM coding by offering coding references via a user interface, demonstrating the potential to reduce the manual workload and expedite Tw-DRG assessment. Consistency in performance affirmed the effectiveness of the system in supporting CCSs in ICD-10-CM coding and the judgment of Tw-DRGs.
AB - Background: International Classification of Diseases codes are widely used to describe diagnosis information, but manual coding relies heavily on human interpretation, which can be expensive, time consuming, and prone to errors. With the transition from the International Classification of Diseases, Ninth Revision, to the International Classification of Diseases, Tenth Revision (ICD-10), the coding process has become more complex, highlighting the need for automated approaches to enhance coding efficiency and accuracy. Inaccurate coding can result in substantial financial losses for hospitals, and a precise assessment of outcomes generated by a natural language processing (NLP)–driven autocoding system thus assumes a critical role in safeguarding the accuracy of the Taiwan diagnosis related groups (Tw-DRGs). Objective: This study aims to evaluate the feasibility of applying an International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM), autocoding system that can automatically determine diagnoses and codes based on free-text discharge summaries to facilitate the assessment of Tw-DRGs, specifically principal diagnosis and major diagnostic categories (MDCs). Methods: By using the patient discharge summaries from Kaohsiung Medical University Chung-Ho Memorial Hospital (KMUCHH) from April 2019 to December 2020 as a reference data set we developed artificial intelligence (AI)–assisted ICD-10-CM coding systems based on deep learning models. We constructed a web-based user interface for the AI-assisted coding system and deployed the system to the workflow of the certified coding specialists (CCSs) of KMUCHH. The data used for the assessment of Tw-DRGs were manually curated by a CCS with the principal diagnosis and MDC was determined from discharge summaries collected at KMUCHH from February 2023 to April 2023. Results: Both the reference data set and real hospital data were used to assess performance in determining ICD-10-CM coding, principal diagnosis, and MDC for Tw-DRGs. Among all methods, the GPT-2 (OpenAI)-based model achieved the highest F1-score, 0.667 (F1-score 0.851 for the top 50 codes), on the KMUCHH test set and a slightly lower F1-score, 0.621, in real hospital data. Cohen κ evaluation for the agreement of MDC between the models and the CCS revealed that the overall average κ value for GPT-2 (κ=0.714) was approximately 12.2 percentage points higher than that of the hierarchy attention network (κ=0.592). GPT-2 demonstrated superior agreement with the CCS across 6 categories of MDC, with an average κ value of approximately 0.869 (SD 0.033), underscoring the effectiveness of the developed AI-assisted coding system in supporting the work of CCSs. Conclusions: An NLP-driven AI-assisted coding system can assist CCSs in ICD-10-CM coding by offering coding references via a user interface, demonstrating the potential to reduce the manual workload and expedite Tw-DRG assessment. Consistency in performance affirmed the effectiveness of the system in supporting CCSs in ICD-10-CM coding and the judgment of Tw-DRGs.
KW - deep learning
KW - electronic medical record
KW - International Classification of Diseases
KW - natural language processing
KW - Taiwan diagnosis related groups
UR - http://www.scopus.com/inward/record.url?scp=85204440358&partnerID=8YFLogxK
U2 - 10.2196/58278
DO - 10.2196/58278
M3 - Article
C2 - 39302714
AN - SCOPUS:85204440358
SN - 1438-8871
VL - 26
JO - Journal of Medical Internet Research
JF - Journal of Medical Internet Research
M1 - e58278
ER -