TY - JOUR
T1 - Bridging accuracy and interpretability
T2 - A rescaled cluster-then-predict approach for enhanced credit scoring
AU - Teng, Huei Wen
AU - Kang, Ming Hsuan
AU - Lee, I. Han
AU - Bai, Le Chi
N1 - Publisher Copyright:
© 2023
PY - 2024/1
Y1 - 2024/1
N2 - Credit scoring is pivotal in the financial industry for assessing individuals’ creditworthiness and optimizing financial institutions’ risk-adjusted returns. While the XGBoost algorithm stands as the state-of-the-art classifier for credit scoring, its intricate nature impedes easy interpretation, a critical aspect for stakeholders’ decision-making. This paper introduces a novel approach termed the “Rescaled Cluster-then-Predict Method,” aimed at enhancing both the interpretability and predictive performance of credit scoring models. Our method employs a two-step process, initially rescaling the features and subsequently clustering the data into subgroups. Consequently, we employ Logistic Regression within each subgroup to generate predictions. The paper's primary contributions are twofold. Firstly, empirical evaluations on two distinct datasets demonstrate that our proposed method attains a competitive performance compared to XGBoost while substantially improving interpretability. Notably, in some instances, the Logistic Regression outperforms XGBoost. Secondly, we reveal that clustering solely the positive cases, as opposed to the entire dataset, yields comparable results while markedly reducing computational requirements. These insights hold significant practical implications for the financial industry, which consistently seeks credit scoring models that are not only accurate but also interpretable and computationally efficient.
AB - Credit scoring is pivotal in the financial industry for assessing individuals’ creditworthiness and optimizing financial institutions’ risk-adjusted returns. While the XGBoost algorithm stands as the state-of-the-art classifier for credit scoring, its intricate nature impedes easy interpretation, a critical aspect for stakeholders’ decision-making. This paper introduces a novel approach termed the “Rescaled Cluster-then-Predict Method,” aimed at enhancing both the interpretability and predictive performance of credit scoring models. Our method employs a two-step process, initially rescaling the features and subsequently clustering the data into subgroups. Consequently, we employ Logistic Regression within each subgroup to generate predictions. The paper's primary contributions are twofold. Firstly, empirical evaluations on two distinct datasets demonstrate that our proposed method attains a competitive performance compared to XGBoost while substantially improving interpretability. Notably, in some instances, the Logistic Regression outperforms XGBoost. Secondly, we reveal that clustering solely the positive cases, as opposed to the entire dataset, yields comparable results while markedly reducing computational requirements. These insights hold significant practical implications for the financial industry, which consistently seeks credit scoring models that are not only accurate but also interpretable and computationally efficient.
KW - Cluster-then-predict
KW - Credit scoring
KW - Logistic Regression
KW - Rescaling
KW - XGBoost
UR - http://www.scopus.com/inward/record.url?scp=85174733845&partnerID=8YFLogxK
U2 - 10.1016/j.irfa.2023.103005
DO - 10.1016/j.irfa.2023.103005
M3 - Article
AN - SCOPUS:85174733845
SN - 1057-5219
VL - 91
JO - International Review of Financial Analysis
JF - International Review of Financial Analysis
M1 - 103005
ER -