Prediction of PM2.5 Dust Concentration Using Machine Learning Techniques To support forecasting, advanced Machine Learning technology is therefore employed to assist in the study of PM2.5 dust concentration prediction in advance.
Main Article Content
Abstract
Fine particulate matter (PM2.5) is a significant air pollution issue, particularly in urban areas with high emissions, such as Bangkok, Thailand, and Guangzhou, China—one of the most populous cities in the world. This research focuses on developing a predictive system for PM2.5 concentration using machine learning techniques, including Linear Regression (LR), Support Vector Regression (SVR), and XGBoost, to aid in air pollution monitoring and management. The dataset used in this study is a secondary source containing recorded PM2.5 values from Guangzhou, China, between 2010 and 2015. Experimental comparisons of the three models reveal that XGBoost demonstrates the highest predictive performance across all timeframes. Specifically, for the 1-hour ahead prediction, the XGBoost model incorporating historical PM2.5 averages and seasonal data achieved an R² of 0.6728, MAE of 12.06, and RMSE of 17.87, outperforming both LR and SVR. Furthermore, the predictive performance of all models declined as the forecasting timeframe increased, but XGBoost consistently outperformed the other methods in every scenario. The inclusion of seasonal information and historical PM2.5 averages significantly enhanced the model’s ability to predict future PM2.5 concentrations.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
I/we certify that I/we have participated sufficiently in the intellectual content, conception and design of this work or the analysis and interpretation of the data (when applicable), as well as the writing of the manuscript, to take public responsibility for it and have agreed to have my/our name listed as a contributor. I/we believe the manuscript represents valid work. Neither this manuscript nor one with substantially similar content under my/our authorship has been published or is being considered for publication elsewhere, except as described in the covering letter. I/we certify that all the data collected during the study is presented in this manuscript and no data from the study has been or will be published separately. I/we attest that, if requested by the editors, I/we will provide the data/information or will cooperate fully in obtaining and providing the data/information on which the manuscript is based, for examination by the editors or their assignees. Financial interests, direct or indirect, that exist or may be perceived to exist for individual contributors in connection with the content of this paper have been disclosed in the cover letter. Sources of outside support of the project are named in the cover letter.
I/We hereby transfer(s), assign(s), or otherwise convey(s) all copyright ownership, including any and all rights incidental thereto, exclusively to the Journal, in the event that such work is published by the Journal. The Journal shall own the work, including 1) copyright; 2) the right to grant permission to republish the article in whole or in part, with or without fee; 3) the right to produce preprints or reprints and translate into languages other than English for sale or free distribution; and 4) the right to republish the work in a collection of articles in any other mechanical or electronic format.
We give the rights to the corresponding author to make necessary changes as per the request of the journal, do the rest of the correspondence on our behalf and he/she will act as the guarantor for the manuscript on our behalf.
All persons who have made substantial contributions to the work reported in the manuscript, but who are not contributors, are named in the Acknowledgment and have given me/us their written permission to be named. If I/we do not include an Acknowledgment that means I/we have not received substantial contributions from non-contributors and no contributor has been omitted.
References
Carlos Ordóñez , Ruth M. Doherty. (2023). Modulation of daily PM2.5 concentrations over China in winter by large-scale circulation and climate change. เล่มที่ 23 ฉบับที่ 4
Fan W, Xu L, Zheng H. Using Multisource Data to Assess PM2.5 Exposure and Spatial Analysis of Lung Cancer in Guangzhou, China. Int J Environ Res Public Health. 2022 Feb 24;19(5):2629. doi: 10.3390/ijerph19052629. PMID: 35270346; PMCID: PMC8910196.
Jay Chugh. (December 13, 2018 ). Types of Machine Learning and Top 10 Algorithms Everyone Should Know. สืบค้นจาก https://blogs.oracle.com/ai-and-datascience/post/types-of-machine-learning-and-top-10-algorithms-everyone-should-know
P. Minsan and W. Panichkitkosolkul. (2024). Enhancing decomposition and Holt-Winters weekly forecasting of pm2.5 concentrations in Thailand’s eight northern provinces using the cuckoo search algorithm. Thailand Statistician, 22(4), 963–985.
Minsan, P. and W. Minsan. (2024a). Decomposition and Holt-Winters techniques enhanced by whale optimization algorithm: case study of pm2.5 forecasting in 8 northern provinces of Thailand. Thai Science and Technology Journal, 32(6), 12-34.
Tianqi Chen , Carlos Guestrin. (2016). XGBoost: A Scalable Tree Boosting System. University of Washington. สืบค้นจาก https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf
กชรัตน์ นฤพัฒน์ผจง , นภา แซ่เบ๊. (2566). การเรียนรู้ของเครื่องสำหรับการทำนายค่าความเข้มของฝุ่นละอองขนาดเล็ก (PM2.5). หลักสูตรวิทยาศาสตรมหาบัณฑิต คณะวิทยาศาสตร์ มหาวิทยาลัยศรีนครินทรวิโรฒ
กระทรวงสาธารณสุข, ก. (2566). รายงานสรุปสถานการณ์และผลการดำเนินงานด้าน การแพทย์และสาธารณสุข กรณี หมอกควันและฝุ่นละอองขนาดเล็ก ปี 2567. กองประเมินผลกระทบต่อสุขภาพ กรมอนามัย กระทรวงสาธารณสุข.
