AdjcorT-RBFNN for Air Quality Classification: Mitigating Multicollinearity with Real and Simulated Data 10.32526/ennrj/23/20250006
Main Article Content
Abstract
Air pollution levels have remained a significant issue worldwide despite advancements in technology, primarily due to rapid industrialization and urbanization. Among the various pollutants, PM2.5 significantly impacts air quality, posing health risks such as respiratory and cardiovascular diseases. Accurate prediction of PM2.5 levels is essential for effective air quality management. However, multicollinearity in air quality data can hinder model performance. To address this issue, this study introduces the AdjcorT-RBFNN, a two-stage feature selection method, to classify air quality in Klang, Selangor. The AdjcorT-RBFNN model selects the optimal combination of 9 feature combinations from 10 variables and outperforms the RBFNN model, which uses all 10 variables. With 7 hidden nodes and a learning rate of 0.01 for both models, AdjcorT-RBFNN achieves higher accuracy (0.62), sensitivity (0.64), specificity (0.60), precision (0.60), F1 score (0.62), and AUROC (0.62), confirming its effectiveness in classification tasks. The optimal features for predicting air quality in Klang are identified as PM2.5, PM10, relative humidity, SO2, wind direction, O3, CO, ambient temperature, and NO2. Monte Carlo simulations validate the model’s effectiveness, showing that AdjcorT-RBFNN consistently outperforms RBFNN, especially with strong negative correlations (ρ=-0.8) and larger sample sizes (N=150 and 200) further enhance classification accuracy. Compared to RBFNN, AdjcorT-RBFNN enhances class discrimination and reduces false positives, improving its reliability in detecting true classifications. These findings highlight the importance of feature selection in improving model performance, particularly in datasets with multicollinearity. Researchers, and health organizations can leverage AdjcorT-RBFNN for more accurate air quality predictions, supporting informed pollution control strategies.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Published articles are under the copyright of the Environment and Natural Resources Journal effective when the article is accepted for publication thus granting Environment and Natural Resources Journal all rights for the work so that both parties may be protected from the consequences of unauthorized use. Partially or totally publication of an article elsewhere is possible only after the consent from the editors.
References
Aarthi C, Ramya VJ, Falkowski-Gilski P, Divakarachari PB. Balanced spider monkey optimization with Bi-LSTM for sustainable air quality prediction. Sustainability 2023;15: Article No. 1637.
Akshay A, Abedi M, Shekarchizadeh N, Burkhard FC, Katoch M, Bigger-Allen A, et al. MLcps: Machine learning cumulative performance score for classification problems. GigaScience 2022;12:Article No. 108.
Alalwany E, Mahgoub I. An effective ensemble learning-based real-time intrusion detection scheme for an In-Vehicle network. Electronics 2024;13:Article No. 919.
Arafin SK, Ul-Saufie AZ, Ghani NA, Ibrahim N. A two-stage feature selection method to enhance prediction of daily PM2.5 concentration air pollution. Environment and Natural Resources Journal 2024;22(6):500-9.
Ariansyah MH, Winarno S, Fitri EN, Retha HMA. Multi-Layer perceptron for diagnosing stroke with the SMOTE method in overcoming data imbalances. Innovation in Research of Informatics 2023;5(1):1-8.
Chandra W, Resti Y, Suprihatin B. Implementation of a breakpoint halfway discretization to predict Jakarta’s air quality. INOMATIKA 2022;4(1):1-10.
Chen C, Li K. Spatiotemporal stacking method with daily‐cycle restrictions for reconstructing missing hourly PM2.5 records. Transactions in GIS 2024;28:349-67.
Ghazali SM, Shaadan N, Idrus Z. A comparative study of several EOF based imputation methods for long gap missing values in a Single-Site Temporal Time Dependent (SSTTD) Air Quality (PM10) dataset. Pertanika Journal of Science and Technology 2021;29(4):2625-43.
Hou J, Ye X, Feng W, Zhang Q, Han Y, Liu Y, et al. Distance correlation application to gene co-expression network analysis. BMC Bioinformatics 2022;23(1):Article No. 81.
Ibrahim NB. Variable Selection Methods for Classification: Application to Metabolomics Data [dissertation]. United Kingdom: The University of Liverpool; 2020.
Jalali S, Karbakhsh M, Momeni M, Taheri M, Amini S, Mansourian M, et al. Long-term exposure to PM2.5 and cardiovascular disease incidence and mortality in an Eastern Mediterranean country: Findings based on a 15-year cohort study. Environmental Health 2021;20(1):Article No. 112.
Kalajdjieski J, Zdravevski E, Corizzo R, Lameski P, Kalajdziski S, Pires IM, et al. Air pollution prediction with multi-modal data and deep neural networks. Remote Sensing 2020;12:Article No. 4142.
Kılıçoğlu Șevval, Yerlikaya-Özkurt F. A novel comparison of shrinkage methods based on multi criteria decision making in case of multicollinearity. Journal of Industrial and Management Optimization 2024;20:3816-42.
Li T, Lu J, Wu J, Zhang Z, Chen L. Predicting aquaculture water quality using machine learning approaches. Water 2022; 14:Article No. 2836.
Liu R. Monte-Carlo Simulations and applications in machine learning, option pricing, and quantum processes. Highlights in Science Engineering and Technology 2024;88:1132-7.
Liu Y, Zhou Y, Lu J. Exploring the relationship between air pollution and meteorological conditions in China under environmental governance. Scientific Reports 2020;10:Article No. 14518.
Mohtar AAA, Latif MT, Dominick D, Ooi MCG, Azhari A, Baharudin NH, et al. Spatiotemporal variations of particulate matter and their association with criteria pollutants and meteorology in Malaysia. Aerosol and Air Quality Research 2022;22:Article No. 220124.
Nazari L, Aslan MF, Sabanci K, Ropelewska E. Integrated transcriptomic meta-analysis and comparative artificial intelligence models in maize under biotic stress. Scientific Reports 2023;13(1):Article No. 15899.
Sapari AM, Hadiana AI, Umbara FR. Air quality classification using extreme gradient boosting (XGBOOST) algorithm. Innovation in Research of Informatics 2023;5(2):44-51.
Suresh S, Newton DT, Everett TH, Lin G, Duerstock BS. Feature selection techniques for a machine learning model to detect autonomic dysreflexia. Frontiers in Neuroinformatics 2022;16:Article No. 901428.
Ul-Saufie AZ, Hamzan NH, Zahari Z, Shaziayani WN, Noor NM, Zainol MRRMA, et al. Improving air pollution prediction modelling using wrapper feature selection. Sustainability 2022;14:Article No. 11403.
Van Rossum MC, Da Silva PMA, Wang Y, Kouwenhoven EA, Hermens HJ. Missing data imputation techniques for wireless continuous vital signs monitoring. Journal of Clinical Monitoring and Computing 2023;37:1387-400.
Wang W, Yang S, Yin K, Zhao Z, Ying N, Fan J. Network approach reveals the spatiotemporal influence of traffic on air pollution under COVID-19. Chaos an Interdisciplinary Journal of Nonlinear Science 2022;32:Article No. 041106.
Wattimena EMC, Annisa A, Sitanggang IS. CO and PM10 prediction model based on air quality index considering meteorological factors in DKI Jakarta using LSTM. Scientific Journal of Informatics 2022;9:123-32.
Zhang B, Duan M, Sun Y, Lyu Y, Hou Y, Tan T. Air Quality Index Prediction in six major Chinese urban agglomerations: A comparative study of single Machine Learning Model, ensemble Model, and Hybrid Model. Atmosphere 2023; 14:Article No. 1478.
Zhou Y, Mu T, Pang Z-H, Zheng C. A survey on hyper basis function neural networks. Systems Science and Control Engineering 2019a;7:495-507.
Zhou H, Wang T, Zhou F, Liu Y, Zhao W, Wang X, et al. Ambient air pollution and daily hospital admissions for respiratory disease in children in Guiyang, China. Frontiers in Pediatrics 2019b;7:Article No. 00400.