TY - JOUR
T1 - Interpretable Electronic Transfer Fraud Detection with Expert Feature Constructions
AU - Hsin, Yu Yen
AU - Dai, Tian Shyr
AU - Ti, Yen Wu
AU - Huang, Ming Chuan
N1 - Publisher Copyright:
© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). This research was funded by the Ministry of Science and Technology (MOST) of Taiwan under grants MOST 109-2218-E-009 -015 CEUR Workshop Proceedings (CEUR-WS.org)
PY - 2021
Y1 - 2021
N2 - Since the magnitude of financial frauds grow rapidly with low clearance rates, detecting and avoiding frauds has been a tremendous challenge for financial institutions. Both the detection performance and interpretability are critical for fraud detection to profile the fraudsters' modus operandi and to spot vulnerabilities of financial systems/processes. Traditional rule-based approaches yield poor detection performances. Recent machine learning methods basically generate recency, frequency, and temporal features to extract patterns from raw transaction data. On the other hand, this paper generates behavioral and (financial organization's) segmentation features based on financial expertise and characteristics solely belonging to (non)-fraudulent accounts. While inputting aforementioned features into different models and using accumulated features from past literature generate unstable prediction results, our features generate the best and stable results for the decision-tree-base approach like Extreme Gradient Boosting and Light Gradient Boosting Machine. By using Kolmogorov-Smirnov test, we discover the instable predictive results are caused by vastly different distributions of features that reflects the fast-changing modus operandi in the training/testing sets. Thus, generating training/testing sets by random sampling (compared to chronological separation) is improper for modeling time varying data. Combining XGBoost with our expertise-based features provides clear causal-effect between features and fraudulent labels for further interpretations. The high precision and recall rates allow banks to save screening labor costs and identify frauds without interfering with normal transactions. The quality of our features can be examined by showing that they occupy three out of the five most important features under the ranking procedure in a premium finance publication by Butaru et al. [Journal of Banking and Finance (72) 218-239 (2016)].
AB - Since the magnitude of financial frauds grow rapidly with low clearance rates, detecting and avoiding frauds has been a tremendous challenge for financial institutions. Both the detection performance and interpretability are critical for fraud detection to profile the fraudsters' modus operandi and to spot vulnerabilities of financial systems/processes. Traditional rule-based approaches yield poor detection performances. Recent machine learning methods basically generate recency, frequency, and temporal features to extract patterns from raw transaction data. On the other hand, this paper generates behavioral and (financial organization's) segmentation features based on financial expertise and characteristics solely belonging to (non)-fraudulent accounts. While inputting aforementioned features into different models and using accumulated features from past literature generate unstable prediction results, our features generate the best and stable results for the decision-tree-base approach like Extreme Gradient Boosting and Light Gradient Boosting Machine. By using Kolmogorov-Smirnov test, we discover the instable predictive results are caused by vastly different distributions of features that reflects the fast-changing modus operandi in the training/testing sets. Thus, generating training/testing sets by random sampling (compared to chronological separation) is improper for modeling time varying data. Combining XGBoost with our expertise-based features provides clear causal-effect between features and fraudulent labels for further interpretations. The high precision and recall rates allow banks to save screening labor costs and identify frauds without interfering with normal transactions. The quality of our features can be examined by showing that they occupy three out of the five most important features under the ranking procedure in a premium finance publication by Butaru et al. [Journal of Banking and Finance (72) 218-239 (2016)].
KW - Boosted decision tree
KW - Electronic transfer fraud detection
KW - Feature engineering
KW - Interpretability
UR - http://www.scopus.com/inward/record.url?scp=85122883392&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85122883392
SN - 1613-0073
VL - 3052
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2021 International Conference on Information and Knowledge Management Workshops, CIKMW 2021
Y2 - 1 November 2021 through 5 November 2021
ER -