Machine Learning Approaches for Credit Risk Assessment in SMEs

Authors

  • Piyada Wongwiwat Department of Computer Science and Data Innovation, Faculty of Science and Technology, Suansunandha Rajabhat University, Bangkok, Thailand
  • Apisak Chairojwattana Department of Mathematics, Faculty of Science, Burapha University, Chon Buri, Thailand
  • Wikanda Phaphan Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand

Keywords:

Credit risk models, decision tree, support vector machine, logistic regression, adaptive boosting model, AdaBoost, NPL, status of installment payments, creditworthiness

Abstract

Small and medium enterprises (SMEs) often face difficulties securing loans from financial institutions due to perceived high credit risk. Machine learning-based credit risk assessment models (CRA) offer a promising approach to assessing borrower creditworthiness more objectively. This study compared the performance of four credit assessment techniques: decision trees, support vector machines (SVMs), logistic regression, and adaptive boosting model. All models were evaluated using K-fold cross-validation implemented in Python. Decision trees and adaptive boosting demonstrated comparable recall scores, offering a trade-off between overall accuracy and FN rate. Due to inherent data imbalance with a higher prevalence of regular loans compared to non-performing loans (NPLs), the dataset was balanced to address potential biases. Following data balancing, adaptive boosting achieved the highest overall accuracy, making it the most suitable model for predicting SME loan defaults in this study.

References

Abdulsaleh AM, Worthington AC. Small and medium-sized enterprises financing: A review of literature. Int J Bus Manag. 2013; 8(14), 36; http://www.doi.org/10.5539/ijbm.v8n14p36

Akwaa-Sekyi EK, Bosompra P. Determinants of business loan default in Ghana. J Sci Res. 2015; 1(1): 10-26.

Alfaro R, Gallardo N. The determinants of household debt default. Rev Anal Econom. 2012; 27(1): 55-70.

Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press; 2000.

Bhowmik T. Data mining. New Delhi: Dorling Kindersley India; 2011.

Bhowmik R. Detecting auto insurance fraud by data mining techniques. CIS J. 2011; 2(4): 156-162.

Botchey FE, Qin Z, Hughes-Lartey K. Mobile money fraud prediction—a cross-case analysis on the efficiency of support vector machines, gradient boosted decision trees, and naïve Bayes algorithms. Inf. 2020; 11(8): 383, https://doi.org/10.3390/info11080383

Breiman L. Random forests. Mach Learn. 2001; 45(1): 5-32.

Breiman L, Friedman J, Olshen RA, Stone CJ. Classification and regression trees. New York: Chapman and Hall/CRC; 1984.

Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995; 20(3): 273-297.

Dahiya S, Handa SS, Singh Netra PS. Credit scoring using ensemble of various classifiers on reduced feature set. Industrija. 2015; 43(4): 163-172.

Drucker H, Burges CJC, Kaufman L, Smola AJ, Vapnik VN. Support vector regression. Adv Neural Inf Process Syst. 1996; 9: 155-161.

Fantazzini D, Figini S. Random survival forests models for SME credit risk measurement. Methodol Comput Appl Prob. 2009; 1: 29-45.

Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Stat. 2001; 29(5): 1189-1232.

Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997; 55(1): 119-139.

Goh RY, Lee LS. Credit scoring: A review on support vector machines and metaheuristic approaches. Adv Oper Res. 2019; 2019: 1-30.

Gunnarsson BR, Broucke SV, Baesens B, Óskarsdóttir M, Lemahieu W. Deep learning for credit scoring: Do or don’t?. Eur J Oper Res. 2021; 295(1): 292-305.

Harrell Jr FE. Regression modeling strategies. New York: Springer; 2001.

Harrell FE. Binary logistic regression. In: Harrell FE, editor. Regression modeling strategies. New York: Springer Series in Statistics; 2001. p. 215-267.

Li XL, Zhong Y. An overview of personal credit scoring: techniques and future work. Int J Intell Sci. 2012; 2: 181-189.

Kithinji AM. Credit risk management and profitability of commercial banks in Kenya. Nairobi: School of Business, University of Nairobi; 2010.

Li Y, Chen W. A comparative performance assessment of ensemble learning for credit scoring. Math. 2020; 8(10): 1756; https://doi.org/10.3390/math8101756

Liu X, Fu H, Lin W. A modified support vector machine model for credit scoring. Int J Comput Intell Syst. 2010; 3(6): 797-804.

Loh WY. Classification and regression trees. Wiley Interdiscip Rev Data Mining Knowl Discov. 2011; 1(1): 14-23.

Lu Y, Yang L, Shi B, Li J, Abedin MZ. A novel framework of credit risk feature selection for SMEs during industry 4.0. Ann Oper Res. 2022: 1-28.

McCann F, McIndoe-Calder T. Determinants of SME loan default: The importance of borrower-level heterogeneity. Research Technical Papers from Central Bank and Financial Services Authority of Ireland. 2012; 06/RT/12.

Noriega JP, Rivera LA, Herrera JA. Machine learning for credit risk prediction. A systematic literature review. Data. 2023; 8: 169.

Ptak-Chmielewska A, Matuszyk A. Application of the random survival forests method in bankruptcy prediction for small and medium enterprises. Argum Oeconom. 2020; 1(44): 127-142.

Quinlan JR. Induction of decision trees. Mach Learn. 1986; 1(1): 81-106.

Si Z, Niu H, Wang W. Credit risk assessment by a comparison application of two boosting algorithms. In: Tallón-Ballesteros AJ, editor. Fuzzy Systems and Data Mining VIII. IOS Press; 2022. p. 34-40.

Simmachan T, Manopa W, Neamhom P, Poothong A, Phaphan W. Detecting fraudulent claims in automobile insurance policies by data mining techniques. Thail Stat. 2023; 21(3): 552-568.

Srisuradetchai P, Panichkitkosolkul W. Using ensemble machine learning methods to forecast particulate matter (PM) in Bangkok, Thailand. In: International Conference on Multi-disciplinary Trends in Artificial Intelligence; 2022. p. 204-215.

So YS, Hong SK. Random effects logistic regression model for default prediction of technology credit guarantee fund. Eur J Oper Res. 2007; 183(1): 472-478.

Tiwari KK, Somani R, Mohammad I. Determinants of loan delinquency in personal loan. Int J Manag. 2020; 11(11): 2566-2575.

Vakili M, Ghamsari M, Rezaei M. Performance analysis and comparison of machine and deep learning algorithms for IoT data classification. ArXiv. 2020; 2001.09636.

Vapnik VN. Statistical learning theory. New York: John Wiley & Sons; 1998.

Vapnik VN. The nature of statistical learning theory. New York: Springer-Verlag; 1998.

Wang G, Hao J, Ma J, Jiang H. A comparative assessment of ensemble learning for credit scoring. Expert Syst Appl. 2011; 38(1): 223-230.

Walusala WS, Rimiru DR, Otieno DC. A hybrid machine learning approach for credit scoring using PCA and logistic regression. Int J Comput. 2017; 27(1): 84-102.

Xie R, Liu R, Liu XB, Zhu JM. Evaluation of SMEs’ credit decision based on support vector machine-logistics regression. J Math. 2021:1-10.

Zan H, Hsinchun C, Chia-Jung H, Wun-Hwa C, Soushan W. Credit rating analysis with support vector machines and neural networks: a market comparative study. Decis Support Syst. 2004; 37(4): 543-558.

Zan HW, Wang P, Zhang YQ. Bankruptcy prediction using support vector machine and boosting in an unbalanced dataset. In: Int Conf Mach Learn Cybernetics. IEEE; 2004. p. 3209-3213.

Zhang GP. Support vector machines in theory and financial time series forecasting. Eur J Oper Res. 2001; 129(1): 49-68.

Zhang L, Hu H, Zhang D. A credit risk assessment model based on SVM for small and medium enterprises in supply chain finance. Financ Innov. 2015;1(1):1-21.

Zhang T. An introduction to support vector machines and other kernel-based learning methods. AI Mag. 2001; 22(2): 103-103.

Zhang Y, Zhou W, Luo Y. A novel hybrid credit risk evaluation model based on SVM and assembly strategy. J Comput Theor Nanosci. 2015; 12(8): 1835-1841.

Zhang Z, Niu K, Liu Y. A deep learning based online credit scoring model for P2P lending. IEEE Access. 2020; 8: 177307-177317.

Zhu Y, Li Z, Xie C, Wang GJ, Nguyen TV. Forecasting SMEs’ credit risk in supply chain finance with an enhanced hybrid ensemble machine learning approach. Int J Prod Econ. 2019; 211: 22-33.

Downloads

Published

2025-06-24

How to Cite

Wongwiwat, P. ., Chairojwattana, A. ., & Phaphan, W. . (2025). Machine Learning Approaches for Credit Risk Assessment in SMEs. Thailand Statistician, 23(3), 710–731. retrieved from https://ph02.tci-thaijo.org/index.php/thaistat/article/view/259947

Issue

Section

Articles