Improving Loan Status Prediction Accuracy with Generative Adversarial Networks: Addressing Data Scarcity and Bias

 

Enas A. Raheem*1, Ahmed M. Dinar1, Mazin Abed Mohammed 2, Bourair AL-Attar3

 

1 Computer Engineering Department, University of Technology, Baghdad, Iraq

2 Department of Artificial Intelligence, College of Computer Science and Information Technology, University of Anbar, Anbar 31001, Iraq

3 College of Medicine, University of Al-Ameed, Karbala 1238, Iraq

Abstract

A precise and reliable loan status prediction is of the essence for financial institutions, However, the lack of real-world data and biases within that data can greatly impact the accuracy of machine learning models. Another challenge faced by loan status prediction models is class imbalance, where one category (such as approved loans) is much more common than another (such as defaulted loans), leading to skewed predictions towards the majority class. This study inspects Generative Adversarial Networks (GANs) to augment the data and improve the machine learning models’ performance. Several machine learning (ML) models including but not limited to Support Vector Machines (SVM) and ensemble bagged trees were employed on a Kaggle loan dataset (380 samples). Baseline training and testing accuracies were 86.9% and 86.3% (SVM) and 84.5% and 82.1% (ensemble). ActGAN (Activating Generative Networks) was then utilized to generate synthetic data points for both accepted and rejected loans. Retraining the models with new augmented data showed remarkable improvements: SVM accuracies for training and testing rose to 94.4% and 93.4%, while ensemble models achieved 97.4% and 95.8%, respectively. Other ML models were also explored such as KNN, Decision tree and logistic Regression and showed promising results in terms of accuracy as compared to the state of art. These findings put forward that GAN-based data augmentation can enhance the performance of loan status prediction. Future research could explore GAN’s impact of different architectures and assess the general applicability of this approach.

Emails: enas.a.raheem@uotechnology.edu.iq; ahmed.m.dinar@uotechnology.edu.iq; mazinalshujeary@uoanbar.edu.iq; bourair.alattar@alameed.edu.iq

 

Received: September 25, 2023 Revised: January 24, 2024 Accepted: June 16, 2024

Keywords: loan status; Machine learning; Generative Adversarial Networks; Prediction.