Performance Evaluation of Ensemble Learning Models for Credit Card Fraud Detection on Balanced and Imbalanced Datasets
Authors
Md. Muktar Hossain
(Computer Science and Engineering)
Abstract
Detecting credit card fraud remains a significant challenge in financial security, mainly because of the enormous imbalance in transaction data and the dynamic strategies used by fraudsters. While detecting and preventing fraudulent activities is crucial, these processes using traditional approaches frequently
require significant financial resources, extensive work, and time. This study compares the performance of five models: LightGBM, Decision Tree (DT), XGBoost (XGB), Random Forest (RF), and a Voting Classifier (VC). These models are evaluated using both the balanced 2023 European cardholders dataset and the imbalanced 2013 dataset. To increase the minority class and improve fraud
detection, oversampling techniques (RandomOverSampler, SMOTE, and BorderlineSMOTE) were applied to the imbalanced dataset. While SMOTE improved recall, it reduced precision because of synthetic noise. BorderlineSMOTE focused on hard-to-classify minority instances near class boundaries, with detection improved and false positives reduced, producing a balanced precision-recall
trade-off. RandomOverSampler with ensembles provided the most consistent results for the European cardholders 2013 dataset, achieving an accuracy above 99.9% and an AUC above 97%, as it avoids introducing synthetic noise by reproducing minority instances without modifying the feature space. All models achieved nearly perfect performance on the European cardholders 2023 balanced dataset, with accuracy, precision, and recall exceeding 99.9% and an AUC of 1.0, while ensembles provided slightly improved stability. These results demonstrate that oversampling becomes crucial for imbalanced data, though balanced datasets allow for superior detection without preprocessing, proving ensembles as the most robust method in both scenarios.
Publication Details
Published In:
International Conference on Power, Electronics, Communications, Computing, and Intelligent Infrastructure 2026