A Two-Stage Feature Selection Method to Enhance Prediction of Daily PM2.5 Concentration Air Pollution 10.32526/ennrj/22/20240049

Main Article Content

Siti Khadijah Arafin
Ahmad Zia Ul-Saufie
Nor Azura Md Ghani
Nurain Ibrahim

Abstract

In recent decades, air pollution has negatively affected human health and the environment. One of the important features contributing to air pollution is called PM2.5. However, daily prediction of PM2.5 is still lacking, especially using feature selection infused into the model. Hence, the main objective of this research is to utilize the feature selection procedures by proposing two stages feature selection methods namely adjusted correlation sharing t-test (adjcorT) and radial basis function neural network (RBFNN) in identifying the important features. This consequently also helps enhance the prediction of daily PM2.5 concentrations. Secondary data were obtained from the Department of Environment Malaysia (DOE) from 2018 until 2022 that consists of 5 years of air pollutant daily data. The results found that adjcorT-RBFNN identified the NO2, PM2.5, PM10, CO, O3, wind speed and SO2 as important features. The finding revealed that the accuracy, sensitivity, specificity, precision, F1 score and AUROC value, for a day-ahead prediction in Shah Alam are 0.756, 0.801, 0.717, 0.717, 0.757, and 0.758 respectively. Additionally, the predicted model may serve as an instrument for an early warning system, providing local authorities with information on air quality for formulation of strategies of air quality improvement.

Article Details

How to Cite
Arafin, S. K., Ul-Saufie, A. Z., Md Ghani, N. A., & Ibrahim, N. (2024). A Two-Stage Feature Selection Method to Enhance Prediction of Daily PM2.5 Concentration Air Pollution: 10.32526/ennrj/22/20240049. Environment and Natural Resources Journal, 22(6), 500–509. retrieved from https://ph02.tci-thaijo.org/index.php/ennrj/article/view/252885
Section
Original Research Articles

References

Afrin S, Islam MM, Ahmed T. A meteorology based particulate matter prediction model for megacity Dhaka. Aerosol and Air Quality Research 2021;21(4):1-14.

Ali M, Khan F, Atta MN, Khan A, Khan A. Hybrid crow search and RBFNN: A novel approach to medical data classification. Journal of Informatics and Web Engineering 2024;3(1):252-64.

Atif M, Anwer F, Talib F, Alam R, Masood F. Analysis of machine learning classifiers for predicting diabetes mellitus in the preliminary stage. International Journal of Artificial Intelligence 2023;12(3):1302-11.

Du S, Li T, Yang Y, Horng SJ. Deep air quality forecasting using hybrid deep learning framework. IEEE Transactions on Knowledge and Data Engineering 2019;33(6):2412-24.

Farrell A, Wang G, Rush SA, Martin JA, Belant JL, Butler AB, et al. Machine learning of large‐scale spatial distributions of wild turkeys with high‐dimensional environmental data. Ecology and Evolution 2019;9(10):5938-49.

Hamami F, Fithriyah I. Classification of air pollution levels using artificial neural network. Proceedings of the International Conference on Information Technology Systems and Innovation; 2020 Oct 19-23; Bandung-Padang: Indonesia; 2020.

Holakoei HR, Sajedi F. Compressive strength prediction of SLWC using RBFNN and LSSVM approaches. Neural Computing and Applications 2023;35(9):6685-97.

Ibrahim N. Variable Selection Methods for Classification: Application to Metabolomics Data [dissertation]. University of Liverpool; 2020.

Kahoui H, Chekouri SM, Sahed A. A comparative study of ARIMA, RBFNN, and Hybrid RBFNN-ARIMA models for electricity net consumption forecasting in Algeria. Review of Socio-Economic Perspectives 2024;9(1):189-98.

Kalajdjieski J, Zdravevski E, Corizzo R, Lameski P, Kalajdziski S, Pires IM, et al. Air pollution prediction with multi-modal data and deep neural networks. Remote Sensing 2020; 12(24):1-9.

Ko UW, Kyung SY. Adverse effects of air pollution on pulmonary diseases. Tuberculosis and Respiratory Diseases 2022; 85(4):313-9.

Larasati A, Dwiastutik A, Ramadhanti D, Mahardika A. The effect of Kurtosis on the accuracy of artificial neural network predictive model. Proceedings of the MATEC Web of Conferences; 2018 Aug 30-31; Malang: Indonesia; 2018.

Noh SS, Ibrahim N, Mansor MM, Yusoff M. Hybrid filtering methods for feature selection in high-dimensional cancer data. International Journal of Electrical and Computer Engineering 2023;13(6):6862-71.

Oboya WM, Gichuhi AW, Wanjoya A. A Hybrid DNN-RBFNN model for intrusion detection system. Journal of Data Analysis and Information Processing 2023;11(04):371-87.

Peng HY, Duan SJ, Pan L, Wang MY, Chen JL, Wang YC, et al. Development and validation of machine learning models for nonalcoholic fatty liver disease. Hepatobiliary and Pancreatic Diseases International 2023;22(6):615-21.

Shaziayani WN, Ahmat H, Razak TR, Zainan Abidin AW, Warris SN, Asmat A, et al. A novel hybrid model combining the support vector machine (SVM) and boosted regression trees (BRT) technique in predicting PM10 concentration. Atmosphere 2022;13(12):1-17.

Sokhi RS, Moussiopoulos N, Baklanov A, Bartzis J, Coll I, Finardi S, et al. Advances in air quality research-current and emerging challenges. Atmospheric Chemistry and Physics Discussions 2021;22(7):4615-703.

Suleiman A, Tight MR, Quinn AD. Applying machine learning methods in managing urban concentrations of traffic-related particulate matter (PM10 and PM2.5). Atmospheric Pollution Research 2019;10(1):134-44.

Ul-Saufie AZ, Hamzan NH, Zahari Z, Shaziayani WN, Noor NM, Zainol MR, et al. Improving air pollution prediction modelling using wrapper feature selection. Sustainability 2022;14(18):Article No. 11403.

Wang Z, Tian Z. Analysis of correlation between PM2.5 and major pollutants by the method of path analysis. Proceedings of the International Symposium on Communication Engineering and Computer Science; 2018 Jul 28-29; Hohhot: China; 2018.

Weng S, Chen J, Ding C, Hu D, Liu W, Yang Y, et al. Utilizing machine learning algorithms for the prediction of carotid artery plaques in a Chinese population. Frontiers in Physiology 2023;14:1-12.

Yadav V, Nath S. Daily prediction of PM10 using radial basis function and generalized regression neural network. Proceedings of the Recent Advances on Engineering, Technology and Computational Sciences; 2018 Feb 6-8; Allahabad: India; 2018.

Zhang-James Y, Hoogman M, Franke B, Faraone SV. Machine learning and MRI-based diagnostic models for ADHD: Are we there yet? Journal of Attention Disorders 2023;27(4),335-53.

Zhao Z, Wu J, Cai F, Zhang S, Wang YG. A statistical learning framework for spatial-temporal feature selection and application to air quality index forecasting. Ecological Indicators 2022;144:1-16.