Intelligent Assessment of Athlete Physical Fitness: Addressing Data Imbalance
Main Article Content
Abstract
This study aims to mitigate the impact of imbalanced data through the use of the oversampling technique and to develop supervised learning models for assessing the physical fitness of youth athletes. The dataset comprises the physical fitness test results from 75 athletes aged 11 to 16 years. The dataset presents two major challenges: A limited sample size and a significant class imbalance, with certain fitness levels being underrepresented. This class imbalance can substantially degrade the performance of classification models, as it often leads to biased predictions favoring the majority class while failing to learn the characteristics of minority classes, those that may be most critical in practice. To address this issue, the Synthetic Minority Oversampling Technique (SMOTE) was employed to synthetically balance the class distribution. Five supervised learning algorithms were evaluated: Light Gradient Boosting Machine, Decision Tree, Random Forest, Neural Network, and Multinomial Logistic Regression. The Light Gradient Boosting Machine model yielded the highest accuracy at 87.76%, followed by Decision Tree, Random Forest, and Neural Network models, each with an accuracy of 79.59%. The Multinomial Logistic Regression model achieved the lowest accuracy at 75.51%. On average, the classification accuracy across all models improved to 81.41%, representing a 12.23% increase compared to using the original imbalanced dataset. The results demonstrate that applying oversampling techniques such as SMOTE can effectively alleviate the effects of class imbalance and enhance the predictive performance of machine learning models in the context of physical fitness assessment.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
เนื้อหาข้อมูล
References
R. Sharda, D. Delen, and E. Turban, Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th ed. Harlow, UK: Pearson, 2018, pp. 33-36.
R. C. Prati, G. E. Balista, and M. C. Monard, “Data mining with imbalanced class distributions: concepts and methods,” in Proc. Indian Int. Conf. Artif. Intell., IICAI 2009, 2009, pp. 359-376.
G. Lemaître, F. Nogueira, and C. Aridas, “Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning,” J. Mach. Learn. Res., vol. 18, no. 17, pp. 1-15, Sep. 2016.
U. Ninrutsirikun, H. Imai, B. Watanapa, and C. Arpnikanondt, “Principal component clustered factors for determining study performance in computer programming class,” Wireless Pers. Commun., vol. 115, no. 4, pp. 2897-2916, Dec. 2020.
R. Liu, “A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification,” Appl. Intell., vol. 53, pp. 786-803, Apr. 2003.
H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263-1284, Sep. 2009.
Q. Yang and X. Wu, “10 challenging problems in data mining research,” Int. J. Inf. Technol. Decis. Mak., vol. 5, no. 4, pp. 597-604, Dec. 2006, https://doi.org/10.1142/S021962200 6002258
F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825-2830, Oct. 2011.
T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2016, pp. 85-794.
A. Vaswani et al., “Attention is all you need,” in Adv. Neural Inf. Process. Syst., vol. 30, pp. 5998-6008, Aug. 2017, https://doi.org/10.48550/arXiv.1706.03762
S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Adv. Neural Inf. Process. Syst., vol. 30, pp. 4765-4774, Nov. 2017, https://doi.org/1048550/arXiv.1705.07874
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770-778.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321-357, Jun. 2002, https://doi.org/10.1613/jair.953
P. Thanathammathee and Y. Sirisathitkul, “Improved classification techniques for imbalanced dataset of elderly’s knee osteoarthritis,” J. Sci. Technol., vol. 27, no. 6, pp. 1164-1178, Nov.-Dec. 2019.
S. Wannont and R. Muangsarn, “Improving prediction models of student business career using sampling techniques for learning in multi-classes imbalance dataset,” Chaiyaphum Parithat J., vol. 4, no. 1, pp. 39-49, Jan.-Apr. 2021..
N. Rachburee and W. Punlumjeak, “Oversampling technique in student performance classification from engineering course,” Int. J. Electr. Comput. Eng., vol. 11, no. 4, pp. 3567- 3574, Aug. 2021.
T. Wongvorachan, S. He, and O. Bulut, “A comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining,” Information, vol. 14, no. 1, p. 54, Jan. 2023.
C. Masachai, N. Srisahno, W. Masachai, and W. Buathong, “Relationship physical fitness assessment results of students with data mining at Rajaprachanugroh 1 school,” Ind. Technol. Lampang Rajabhat Univ., vol. 14, no. 2, pp. 1-11, Jul.-Dec. 2021.
S. Kusum, C. Chiewsakul, J. Naksri, N. Mudchanthuek, and W. Deeniwong, “Physical fitness test and standard guidelines for youth athletes,” Regional Sports Sci. Work, Sports Authority of Thailand, Region 3, Ministry of Tourism and Sports, 2019. [Online]. Available: https://set3.org [Accessed: Apr. 25, 2024].
H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” in Proc. IEEE Int. Joint Conf. Neural Netw. (IJCNN), 2008, pp. 1322-1328.
D. Elreedy and A. F. Atiya, “A theoretical distribution analysis of the synthetic minority oversampling technique,” Mach. Learn., vol. 111, no. 1, pp. 157-180, Jan. 2024.
R. Liu, “A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification,” Appl Intell, vol. 53, p. 786803, Jan. 2023, https://doi.org/10.1007/s10489-022-03512-5
G. Douzas and F. Bacao, “Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE,” Inf. Sci., vol. 465, pp. 1-20, Sep. 2017.
RapidMiner, Inc., “RapidMiner Studio, version 9.10.” Docs. Rapidminer. Com. 2024. [Online]. Available: https://www.rapidminer.com [Accessed May 25, 2024].
M. Hall, E. Frank, and G. Holmes, “The WEKA data mining software: An update,” ACM SIGKDD Explor. Newsl., vol. 11, no. 1, pp. 10-18, Nov. 2009, https://doi.org/10.1145/1656274.1656278
G. van Rossum and F. L. Drake, “Python 3 reference manual,”(S. l.) ACM DL. 2009. [Online]. Available: https://dl.acm.org/doi/abs/10.5555/1593511 [Accessed Mar. 20, 2009].
R. Kohavi and F. Provost, “Glossary of terms,” Mach. Learn., vol. 30, no. 2-3, pp. 271-274, Jan. 1998.