Volume 13 , Issue 1 , PP: 251-258, 2024 | Cite this article as | XML | Html | PDF | Full Length Article
Enas A. Raheem 1 * , Ahmed M. Dinar 2 , Mazin Abed Mohammed 3 , Bourair AL-Attar 4
Doi: https://doi.org/10.54216/JISIoT.130118
A precise and reliable loan status prediction is of the essence for financial institutions, However, the lack of real-world data and biases within that data can greatly impact the accuracy of machine learning models. Another challenge faced by loan status prediction models is class imbalance, where one category (such as approved loans) is much more common than another (such as defaulted loans), leading to skewed predictions towards the majority class. This study inspects Generative Adversarial Networks (GANs) to augment the data and improve the machine learning models’ performance. Several machine learning (ML) models including but not limited to Support Vector Machines (SVM) and ensemble bagged trees were employed on a Kaggle loan dataset (380 samples). Baseline training and testing accuracies were 86.9% and 86.3% (SVM) and 84.5% and 82.1% (ensemble). ActGAN (Activating Generative Networks) was then utilized to generate synthetic data points for both accepted and rejected loans. Retraining the models with new augmented data showed remarkable improvements: SVM accuracies for training and testing rose to 94.4% and 93.4%, while ensemble models achieved 97.4% and 95.8%, respectively. Other ML models were also explored such as KNN, Decision tree and logistic Regression and showed promising results in terms of accuracy as compared to the state of art. These findings put forward that GAN-based data augmentation can enhance the performance of loan status prediction. Future research could explore GAN’s impact of different architectures and assess the general applicability of this approach.
loan status , Machine learning , Generative Adversarial Networks , Prediction.
[1] S. M. Fati, “a Loan Default Prediction Model Using Machine Learning and Feature Engineering,” ICIC Express Lett., vol. 18, no. 1, pp. 27–37, 2024.
[2] Z. Wang et al., “Fairness-aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2022-June, pp. 10369–10378, 2022.
[3] M. Anand, A. Velu, and P. Whig, “Prediction of Loan Behaviour with Machine Learning Models for Secure Banking,” J. Comput. Sci. Eng., vol. 3, no. 1, pp. 1–13, 2022.
[4] J. L. Breeden, “A survey of machine learning in credit risk,” J. Credit Risk, vol. 17, no. 3, pp. 1–62, 2021.
[5] A. S, “A Comparison of Various Machine Learning Algorithms and Deep Learning Algorithms for Prediction of Loan Eligibility,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 11, no. 6, pp. 4558–4564, 2023.
[6] K. Bhatt, P. Sharma, M. Verma, and K. Agarwal, “Loan Status Prediction in the Banking Sector using Machine Learning,” in 2023 International Conference on Computational Intelligence, Communication Technology and Networking (CICTN), 2023, pp. 253–259.
[7] S. Wang, S. You, and S. Zhou, “Loan Prediction Using Machine Learning Methods,” Adv. Econ. Manag. Polit. Sci., vol. 5, no. 1, pp. 210–215, 2023.
[8] A. F. and M. M. Miraz Al Mamun, “Predicting Bank Loan Eligibility Using Machine Learning Models and Comparison Analysis,” in Proceedings of the 7th North American International Conference on Industrial Engineering and Operations Management, Orlando, Florida, USA,x, pp. 1423–1432.
[9] G. Shingi, “A federated learning based approach for loan defaults prediction,” in 2020 International Conference on Data Mining Workshops (ICDMW), 2020, pp. 362–368.
[10] L. Yu, X. Zhang, and H. Yin, “An extreme learning machine based virtual sample generation method with feature engineering for credit risk assessment with data scarcity,” Expert Syst. Appl., vol. 202, p. 117363, 2022.
[11] J. Liao, W. Wang, J. Xue, A. Lei, X. Han, and K. Lu, “Combating Sampling Bias: A Self-Training Method in Credit Risk Models,” Proc. 36th AAAI Conf. Artif. Intell. AAAI 2022, vol. 36, pp. 12566–12572, 2022.
[12] A. Wu et al., “Simultaneous Improvement of ML Model Fairnessand Performance by Identifying Bias in Data,” Nature, vol. 388. pp. 1–14, 2020.
[13] A. Singh, J. Singh, A. Khan, and A. Gupta, “Developing a Novel Fair-Loan Classifier through a Multi-Sensitive Debiasing Pipeline: DualFair,” Mach. Learn. Knowl. Extr., vol. 4, no. 1, pp. 240–253, 2022.
[14] J. T. Hancock and T. M. Khoshgoftaar, “Survey on categorical data for neural networks,” J. Big Data, vol. 7, no. 1, 2020.
[15] R. H. Maharrani, P. D. Abda’u, and M. N. Faiz, “Clustering method for criminal crime acts using K-means and principal component analysis,” Indones. J. Electr. Eng. Comput. Sci., vol. 34, no. 1, pp. 224–232, 2024.
[16] C. N. S.-T. J, An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press, 2014.
[17] L. BREIMAN, “Bagging predictors,” in Risks, vol. 24, no. 3, 1996, pp. 123–140.
[18] A. Ali, M. Alrubei, L. F. M. Hassan, M. Al-Ja’afari, and S. Abdulwahed, “Diabetes classification based on KNN,” IIUM Eng. J., vol. 21, no. 1, pp. 175–181, 2020.
[19] H. Blockeel, L. Devos, B. Frénay, G. Nanfack, and S. Nijssen, “Decision trees: from efficient prediction to responsible AI,” Front. Artif. Intell., vol. 6, 2023.
[20] B. Caradima, A. Scheidegger, J. Brodersen, and N. Schuwirth, “Bridging mechanistic conceptual models and statistical species distribution models of riverine fish,” Ecol. Modell., vol. 457, no. August, p. 109680, 2021.
[21] M. Hernandez, G. Epelde, A. Alberdi, R. Cilla, and D. Rankin, “Synthetic data generation for tabular health records: A systematic review,” Neurocomputing, vol. 493, pp. 28–45, 2022.
[22] T. P. Aki Koivu, Mikko Sairanen, Antti Airola, “Synthetic minority oversampling of vital statistics data with generative adversarial networks,” J. Am. Med. Informatics Assoc., vol. 27, no. 11, pp. 1667–1674, 2020.
[23] Y. Dasari, K. Rishitha, and O. Gandhi, “Prediction of Bank Loan Status Using Machine Learning Algorithms,” Int. J. Comput. Digit. Syst., vol. 14, no. 1, pp. 139–146, 2023.
[24] H. Li and W. Wu, “Loan default predictability with explainable machine learning,” Financ. Res. Lett., vol. 60, 2024.
[25] D. Swapnesh Kumar Nayak, T. Swarnkar, and S. Kumari, “Loan Eligibility Prediction Using Machine Learning: a Comparative Approach,” Glob. J. Model. Intell. Comput., vol. 3, no. 1, 2023.