Feature Engineering and Resampling Strategies for Fund Transfer Fraud with Limited Transaction Data and a Time-Inhomogeneous Modi Operandi

Yu Yen Hsin, Tian Shyr Dai*, Yen Wu Ti, Ming Chuan Huang, Ting Hui Chiang, Liang Chih Liu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Detecting financial fraud to profile crimes and pinpoint system vulnerabilities is an essential issue in the financial industry. Because of interpretability requirements and the lack of mass transaction data due to privacy regulations, sophisticated handcrafted features have been adopted in much of the literature for fraud detection. In addition to established recency, frequency, monetary, and anomaly features, we propose behavior- and segmentation-type features based on statistical characteristics belonging solely to (non-)fraudulent accounts informed by financial expertise. Our proposed features are difficult for automatic feature generators to synthesize, and provide transparent cause-effect relationships and good prediction results. Features with time-inhomogeneous properties cause popular boosting classifiers such as XGBoost and LGBM to produce unstable detection results. We use the Kolmogorov-Smirnov test to detect and remove these features to improve XGBoost and LGBM detection performance and robustness. The resulting performance shown in our experiments is better than that of other classifiers, such as SVM and random forests. We examine the advantage of our technique by comparing it with several feature engineering works on fraud detection and automatic feature generation methods. On the other hand, we also find that generating training/testing sets with random sampling falsely eliminates such time inhomogeneity and results in misleading assessments of the robustness of machine learning models. These time-inhomogeneous phenomena also entail various modus operandi patterns, which influence the performance of different resampling methods for addressing data imbalance in fraud detection. Improper linear interpolation of SMOTE-related approaches leads to poor performance due to varying patterns of modi operandi. However, synthesizing fraudulent samples with simple oversampling and GANs mitigates this problem.

Original languageEnglish
Pages (from-to)86101-86116
Number of pages16
JournalIEEE Access
Volume10
DOIs
StatePublished - 2022

Keywords

  • Electronic fund transfer fraud detection
  • feature engineering
  • feature importance ranking
  • Kolmogorov-Smirnov test
  • resampling

Fingerprint

Dive into the research topics of 'Feature Engineering and Resampling Strategies for Fund Transfer Fraud with Limited Transaction Data and a Time-Inhomogeneous Modi Operandi'. Together they form a unique fingerprint.

Cite this