TY - JOUR
T1 - Bankruptcy prediction using machine learning models with the text-based communicative value of annual reports
AU - Chen, Tsung Kang
AU - Liao, Hsien Hsing
AU - Chen, Geng Dao
AU - Kang, Wei Han
AU - Lin, Yu Chun
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/12/15
Y1 - 2023/12/15
N2 - We investigate whether including the text-based communicative value of annual report increases the predictive power of four machine learning models (Logistic regression, Random Forest, XGBoost, and Support Vector Machine) for corporate bankruptcy prediction using U.S. firm observations from 1994 to 2018. We find that the overall prediction effectiveness of these four models (e.g. accuracy, F1-score, AUCs) significantly improves, especially true in the performance of XGBoost and Random Forest models. In addition, we find that annual report text-based communicative value variables significantly reduce models’ Type II error and keep the Type I error at a relatively small level, especially for the short-term bankruptcy forecast. The results reveal that annual report text-based communicative value effectively mitigates the model misidentification of a non-bankrupt firm as a bankrupt firm. Our results also suggest that annual report text-based communicative value is helpful for bank's corporate loan underwriting decisions. Finally, our findings still hold when considering different testing periods and random state settings, replacing by another publicly available bankruptcy dataset, and introducing neural network models.
AB - We investigate whether including the text-based communicative value of annual report increases the predictive power of four machine learning models (Logistic regression, Random Forest, XGBoost, and Support Vector Machine) for corporate bankruptcy prediction using U.S. firm observations from 1994 to 2018. We find that the overall prediction effectiveness of these four models (e.g. accuracy, F1-score, AUCs) significantly improves, especially true in the performance of XGBoost and Random Forest models. In addition, we find that annual report text-based communicative value variables significantly reduce models’ Type II error and keep the Type I error at a relatively small level, especially for the short-term bankruptcy forecast. The results reveal that annual report text-based communicative value effectively mitigates the model misidentification of a non-bankrupt firm as a bankrupt firm. Our results also suggest that annual report text-based communicative value is helpful for bank's corporate loan underwriting decisions. Finally, our findings still hold when considering different testing periods and random state settings, replacing by another publicly available bankruptcy dataset, and introducing neural network models.
KW - Annual report text-based communicative value
KW - Bankruptcy prediction
KW - Credit risk
KW - Incomplete information
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85164221081&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2023.120714
DO - 10.1016/j.eswa.2023.120714
M3 - Article
AN - SCOPUS:85164221081
SN - 0957-4174
VL - 233
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 120714
ER -