An Application of Machine Learning Techniques for Loan Default Payment Prediction
Main Article Content
Abstract
In the banking business, predicting customer default payments has become a crucial operation to prevent and mitigate risks caused by non-performing loans. Presently, machine learning techniques are used alongside traditional methods for this task. This paper explores several ways to apply machine learning techniques in predicting default payments. The prediction development framework includes data encoding, data sampling, and model development. At each step, various techniques are tested and compared to find optimal solutions for business requirements. Our findings conclude that ensemble models are a good choice over a single model to increase the precision of the default payment class. The Over-sampling method is a suitable choice to increase recall of the default payment class, whereas the Under-sampling method is not recommended. Furthermore, if the size of the input vector is a concern, the Weight of Evidence encoding method can be used instead of One-hot encoding without a loss in performance.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
I/we certify that I/we have participated sufficiently in the intellectual content, conception and design of this work or the analysis and interpretation of the data (when applicable), as well as the writing of the manuscript, to take public responsibility for it and have agreed to have my/our name listed as a contributor. I/we believe the manuscript represents valid work. Neither this manuscript nor one with substantially similar content under my/our authorship has been published or is being considered for publication elsewhere, except as described in the covering letter. I/we certify that all the data collected during the study is presented in this manuscript and no data from the study has been or will be published separately. I/we attest that, if requested by the editors, I/we will provide the data/information or will cooperate fully in obtaining and providing the data/information on which the manuscript is based, for examination by the editors or their assignees. Financial interests, direct or indirect, that exist or may be perceived to exist for individual contributors in connection with the content of this paper have been disclosed in the cover letter. Sources of outside support of the project are named in the cover letter.
I/We hereby transfer(s), assign(s), or otherwise convey(s) all copyright ownership, including any and all rights incidental thereto, exclusively to the Journal, in the event that such work is published by the Journal. The Journal shall own the work, including 1) copyright; 2) the right to grant permission to republish the article in whole or in part, with or without fee; 3) the right to produce preprints or reprints and translate into languages other than English for sale or free distribution; and 4) the right to republish the work in a collection of articles in any other mechanical or electronic format.
We give the rights to the corresponding author to make necessary changes as per the request of the journal, do the rest of the correspondence on our behalf and he/she will act as the guarantor for the manuscript on our behalf.
All persons who have made substantial contributions to the work reported in the manuscript, but who are not contributors, are named in the Acknowledgment and have given me/us their written permission to be named. If I/we do not include an Acknowledgment that means I/we have not received substantial contributions from non-contributors and no contributor has been omitted.
References
A. K. I. Hassan and a. Abraham, “Modeling Consumer Loan Default Prediction Using Ensemble Neural Networks”, International Conference on Computing, Electrical and Electronic Engineering (ICCEEE), Khartoum, Sudan, pp. 719-724, 2013.
T. Alam, K. Shaukat, I. A. Hameed, S. Luo, M. U. Sarwar, S. Shabbir, J. Li, and M. Khushi, “An Investigation of Credit Card Default Prediction in the Imbalanced Datasets”, IEEE Access, 8: 201173-201198, 2020.
A. Soni and K. C. P. Shankar, “Bank Loan Default Prediction Using Ensemble Machine Learning Algorithm”, 2nd International Conference on Interdisciplinary Cyber Physical Systems (ICPS), Chennai, India, pp. 170-175, 2022.
S. K. Shaheen and E. ElFakharany, “Predictive analytics for loan default in banking sector using machine learning techniques”, 28th International Conference on Computer Theory and Applications (ICCTA), Alexandria, Egypt, pp. 66-71, 2018.
S. Fan, “Design and implementation of a personal loan default prediction platform based on LightGBM model”, 3rd International Conference on Power, Electronics and Computer Applications, Shenyang (ICPECA), China, pp. 1232-1236, 2023.
L. Lai, “Loan Default Prediction with Machine Learning Techniques”, International Conference on Computer Communication and Network Security (CCNS), Xi'an, China, pp. 5-9, 2020.
S. Barua, D. Gavandi, P. Sangle, L. Shinde, and J. Ramteke, “Swindle: Predicting the Probability of Loan Defaults using CatBoost Algorithm”, 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, pp. 1710-1715, 2021.
A. Al-qerem, G. Al-Naymat, and M. Alhasan, “Loan Default Prediction Model Improvement through Comprehensive Preprocessing and Features Selection”, Arab Conference on Information Technology (ACIT), Al Ain, United Arab Emirates, pp. 235-240, 2019.
B. Patel, H. Patil, J. Hembram, and S. Jaswal, “Loan Default Forecasting using Data Mining”, International Conference for Emerging Technology (INCET), Belgaum, India, pp. 1-4, 2020.
Category Encoders, available at “https://contrib.scikit-learn/category_encoders/”