Data Mining Model Approach for Employment Prediction for University Graduates
Main Article Content
Abstract
Graduate employability prediction has become increasingly important as universities seek to understand the factors influencing employment outcomes and to improve academic planning. However, prior studies in this area often rely on limited datasets, evaluate only a narrow range of models, and lack systematic feature assessment, which restricts the robustness and generalizability of their findings. This study addresses these limitations by developing a comprehensive multi-model data-mining framework to predict the employability of graduates from Rajamangala University of Technology Lanna (RMUTL). A dataset of 4,352 graduate records from the 2023 academic year was analyzed. Three filter-based featureselection techniques—chi-square, information gain, and correlation-based evaluation—were applied to identify the most influential predictors. Five machine-learning algorithms (Decision Tree, Random Forest, Gradient Boosted Trees, Naïve Bayes, and K-Nearest Neighbors) were trained and evaluated using accuracy, precision, recall, F1-score, and AUC. The results show that Random Forest achieved the highest accuracy (83.72%), while Gradient Boosted Trees yielded the highest AUC (0.813), indicating superior class-separation performance. Key predictive factors identified across models included curriculum, education level, department, faculty, campus, gender, and GPA level. This study provides a structured comparative modeling framework and identifies institution-specific predictors that influence graduate employability. The findings offer practical implications for curriculum enhancement, evidence-based academic planning, and career-guidance development aimed at improving employment outcomes for RMUTL graduates.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
Hehn TM, Kooij JF, Hamprecht FA. End-to-end learning of decision trees and forests. Int J Comput Vis. 2020;128(4):997-1011.
Aria M, Cuccurullo C, Gnasso A. A comparison among interpretative proposals for Random Forests. Mach Learn Appl. 2021;6:100094.
Vu QV, Truong VH, Thai HT. Machine learning-based prediction of CFST columns using gradient tree boosting algorithm. Compos Struct. 2021;259:113505.
Itoo F, Meenakshi M, Singh S. Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int J Inf Technol. 2021;13(4):1503-11.
Dong Y, Ma X, Fu T. Electrical load forecasting: A deep learning approach based on K-nearest neighbors. Appl Soft Comput. 2021;99:106900.
Dwivedi YK, Hughes L, Ismagilova E, Aarts G, Coombs C, Crick T, et al. Artificial Intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. Int J Inf Manag. 2021;57:101994.
Hassani H, Silva ES, Unger S, TajMazinani M, Mac Feely S. Artificial intelligence (AI) or intelligence augmentation (IA): what is the future? AI. 2020;1(2):8.
Zhang Z, Zhu X, Liu D. Model of gradient boosting random forest prediction. In: Proceedings of the IEEE International Conference on Networking, Sensing and Control (ICNSC); 2022. p. 1-6.
Dong M, Yao L, Wang X, Benatallah B, Zhang S, Sheng QZ. Gradient boosted neural decision forest. IEEE Trans Serv Comput. 2021;16(1):330-42.
Ayyadevara VK. Gradient boosting machine. In: Pro machine learning algorithms. Berkeley: Apress; 2018. p. 465-506.
Roy SS, Chopra R, Lee KC, Spampinato C, Mohammadi-Ivatloo B. Random forest, gradient boosted machines and deep neural network for stock price forecasting: a comparative analysis on South Korean companies. Int J Ad Hoc Ubiquitous Comput. 2020;33(1):62-71.
El Boujnouni M. A study and identification of COVID-19 viruses using N-grams with Naïve Bayes, K-nearest neighbors, artificial neural networks, decision tree and support vector machine. In: Proceedings of the International Conference on Intelligent Systems and Computer Vision (ISCV); 2022. p. 1-7.
Schröer C, Kruse F, Gómez JM. A systematic literature review on applying CRISP-DM process model. Procedia Comput Sci. 2021;181:526-34.
Rolke W, Gongora CG. A chi-square goodness-of-fit test for continuous distributions against a known alternative. Comput Stat. 2021;36(3):1885-1900.
De Sousa MS, Veiga CE, Albuquerque RD, Giozza WF. Information gain applied to reduce model-building time in decision-tree-based intrusion detection system. In: Proceedings of the Iberian Conference on Information Systems and Technologies (CISTI); 2022. p. 1-6.
Aloisio M, Angeletti P, Casini E, Colzi E, D’Addio S, Oliva-Balague R. Accurate characterization of TWTA distortion in multicarrier operation by means of a correlation-based method. IEEE Trans Electron Devices. 2009;56(5):951-8.
Kotsiantis S, Pierrakeas C, Pintelas P. Predicting student performance in distance learning using machine learning techniques. Appl Artif Intell. 2004;18(5):411-26.
Natek S, Zwilling M. Student data mining solution–knowledge management system related to higher education institutions. Expert Syst Appl. 2014;41(14):6400-7.
Baradwaj BK, Pal S. Mining educational data to analyze students’ performance. arXiv. 2012; arXiv:1201.3417.
Romero C, Ventura S. Educational data mining: a review of the state of the art. IEEE Trans Syst Man Cybern C Appl Rev. 2010;40(6):601-18.