Dealing with Imbalanced Data for Forecasting Model of Higher Education Selection of Grade 9 Students in Muang District Nakhonsawan Province
Main Article Content
Abstract
Nowadays big data has become more and more active because most organisations need to review their organisational performance in the past years and to seek impact factors both strengths and weaknesses for problem solving, operation plan and organisational development including the future prediction. A problem of big data is imbalanced data resulting in less predictive accuracy. Therefore, the study aims to create a proper model of multinomial logistic regression and to compare the predictive accuracy of the models between imbalanced data and balanced data using random over-sampling method and random under-sampling for correcting the data balancing. A real dataset of higher education selection of grade 9 students in Nakhonsawan, 2,692 students, was applied. It was found that main impact factor on students’ decision was demographic. Moreover, the results revealed that the accuracy of forecasting models were more than 60% whereas the predictive accuracy of balanced data was better than another.
Article Details
The content within the published articles, including images and tables, is copyrighted by Rajamangala University of Technology Rattanakosin. Any use of the article's content, text, ideas, images, or tables for commercial purposes in various formats requires permission from the journal's editorial board.
Rajamangala University of Technology Rattanakosin permits the use and dissemination of article files under the condition that proper attribution to the journal is provided and the content is not used for commercial purposes.
The opinions and views expressed in the articles are solely those of the respective authors and are not associated with Rajamangala University of Technology Rattanakosin or other faculty members in the university. The authors bear full responsibility for the content of their articles, including any errors, and are responsible for the content and editorial review. The editorial board is not responsible for the content or views expressed in the articles.
References
[2] Kesornsit W., Lorchirachoonkul V. and Jitthavech J.(2018).Imbalanced Data Problem Solving in Classification of Diabetes Patients.Master thesis, M.S., Khon Kaen University, Khon Kaen.
[3] Chomnok P. and Yanwiset C. Secondary education. Retrieved May 28, 2020, from https://sites.google.com/site/fihakhwamru/kar-suksa-radab-mathymsuksa
[4] Institute for the Promotion of Teching Science and Technology.Different school sizes. Retrieved January 11, 2021, from https://pisathailand.ipst.ac.th/issue-2019-41/?fbclid=IwAR0-Q-XWo8GA83wUPJXvEWHr7DQVtpEJwq2A6XzI0rgAWAxiEg7-3d4KVJg
[5] Bureau of information and communication technology and Office of the permanent secretary ministry of education. (2007-2019). Statistics on further study selection of Mathayomsuksa 3 students. Retrieved May 28, 2020, from http://www.bict.moe.go.th/2020/index.php
[6] Nakhonsawan Provincial Statistical Office. (2018-2019). Nakhon Sawan Provincial Statistical Report. Retrieved May 30, 2020, from http://nksawan.nso.go.th/index.php?option=com_content&view=article&id=335&Itemid=507
[7] Tongpool P., Jamrueng P., Boonrit R. and Sinsomboonthong S. Performance Comparison in Predietion of Imbalanced Data in Data Mining Classification. Master thesis, King Mongkut’s Institute of Technology Ladkrabang, Bangkok.
[8] Woraphongsathorn T. (4/2/2018). Multiple Logistic Regression Analysis. Retrieved May 30, 2020, from http://oec.anamai.moph.go.th/download/OEC_2016/MEETTING2561/APRIL2561/2_5April2561/6-Multiple%20Logistic%20Regression%20Analysis.pdf
[9] Poosumpa S., Sathongkhao P. and Supprasert T. Forecasting Model for the Number of Grade 9 Students Who Intend to Study the Upper Secondary Level in Science, Muang District, Nakhonsawan Province