Development of a Forecasting Model for PM2.5 Concentrations Using Machine Learning Techniques: A Case Study in Thailand
Main Article Content
Abstract
Fine particulate matter smaller than 2.5 micrometers (PM2.5) poses a persistent threat to public health, environmental quality, and agricultural productivity in Thailand, particularly during the dry season, when concentrations frequently exceed national standards. This study aims to develop a PM2.5 forecasting model by comparing two time-series forecasting approaches—Holt–Winters exponential smoothing and the ARIMA model—with machine learning techniques, namely Random Forest and Support Vector Regression (SVR). Monthly PM2.5 data from Bangkok, Chiang Mai, Khon Kaen, and Songkhla for the period 2018–2024 were utilized. The dataset was chronologically divided into an 80% training set and a 20% test set, and model performance was evaluated using the Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). The findings indicate that the Random Forest model consistently achieved the lowest prediction errors across all provinces, particularly in areas with highly volatile PM2.5 patterns, such as Chiang Mai and Khon Kaen. In contrast, SVR yielded relatively low predictive accuracy. Traditional time-series models performed well in provinces with more stable air quality patterns, such as Songkhla. Lag variables and moving averages were identified as key predictors contributing to model accuracy. Overall, the Random Forest model demonstrates strong potential for application in air quality alert systems and for supporting evidence-based environmental and agricultural policy planning toward long-term sustainability.
Article Details
References
Gupta, P., Zhan, S., Mishra, V., Aekakkararungroj, A., Markert, A., Paibong, S., & Chishtie, F. (2021). Machine learning algorithm for estimating surface PM2.5 in Thailand. Aerosol and Air Quality Research, 21(11), 210105. doi:10.4209/aaqr.210105.
Duan, M., Sun, Y., Zhang, B., Chen, C., Tan, T., & Zhu, Y. (2023). PM2.5 concentration prediction in six majorChinese urban agglomerations: A comparative study of various machine learning methods based on meteorological data. Atmosphere, 14(5), 903. doi: 10.3390/atmos14050903.
Hasnain, A., Hashmi, M. Z., Khan, S., Bhatti, U. A., Min, X., Yue, Y., He, Y., & Wei, G. (2024). Predicting ambient PM2.5 concentrations via time series models in Anhui Province, China. Environmental Monitoring and Assessment, 196(487). doi: 10.1007/s10661-024-12644-9.
Jiang, C. (2025). Comparative Analysis of ARIMA and Deep Learning Models for Time Series Prediction.Proceedings of the 2nd International Conference on Data Analysis and Machine Learning (DAML 2024) (p306–310). Kuala Lumpur: Science and Technology Publications.
Kontopoulou, V. I., Panagopoulos, A. D., Kakkos, I., & Matsopoulos, G. K. (2023). A review of ARIMA vs. machine learning approaches for time series forecasting in data driven networks. Future Internet, 15(8), 255. doi: 10.3390/fi15080255.
Li, X., Li, L., Chen, L., Zhang, T., Xiao, J., & Chen, L. (2022). Random Forest Estimation and Trend Analysis of PM2.5 Concentration over the Huaihai Economic Zone, China (2000–2020). Sustainability, 14(14), 8520. doi: 10.3390/SU14148520.
Liu, R., Pang, L., Yang, Y., Gao, Y., Gao, B., Liu, F., & Wang, L. (2023). Air Quality—Meteorology CorrelationModeling Using Random Forest and Neural Network. Sustainability, 15(5), 4531. doi: 10.3390/su15054531.
Makhdoomi, A., Sarkhosh, M., & Ziaei, S. (2025). PM2.5 concentration prediction using machine learning algorithms: an approach to virtual monitoring stations. Scientific Reports, 15, 14775. doi: 10.1038/s41598-025-92019-3.
Merdani, A. (2024). Comparative machine learning analysis of PM2.5 and PM10 forecasting in Albania. InInternational Conference on Software, Telecommunications and Computer Networks (SoftCOM). 1-7. doi: 10.23919/SoftCOM62040.2024.10721971.
Minsan, W., Minsan, P., & Panichkitkosolkul, W. (2024). Enhancing Decomposition and Holt–Winters WeeklyForecasting of PM2.5 Concentrations in Thailand’s Eight Northern Provinces Using the Cuckoo Search Algorithm. Thailand Statistician, 22(4), 963–985.
Mohammadi, F., Teiri, H., Hajizadeh, Y., Abdolahnejad, A., & Ebrahimi, A. (2024). Prediction of atmospheric PM2.5 level by machine learning techniques in Isfahan Iran. Scientific Reports, 14, 2109. doi:10.1038/s41598-024-52617-z.
Pollution Control Department (PCD). (2023). Thailand air quality situation report 2023. Accessed 15 March2024, Retrieved from http://www.pcd.go.th.
Ratchagit, M. (2024). Forecasting PM2.5 concentrations in Chiang Mai using machine learning models.International Journal on Robotics, Automation and Sciences, 6(2), 37–41. doi:10.33093/ijoras.2024.6.2.6.
Thai Quality Historical Data Platform. (2024). Historical air quality data and monthly PM2.5 statistics.Accessed 12 March 2024. Retrieved from https://aqicn.org/historical.
Wongoutong, C. (2021). The effect on forecasting accuracy of the Holt–Winters method when using the incorrect model on a non-stationary time series. Thailand Statistician, 19(3), 565–582.
World Health Organization. (2021). WHO global air quality guidelines: Particulate matter (PM2.5 and PM10),ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide. Accessed 10 March 2024, Retrieved from https://www.who.int/publications/i/item/9789240034228.
Joharestani, M. Z., Cao, C., Ni, X., Bashir, B., & Talebiesfandarani, S. (2019). PM2.5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data. Atmosphere, 10(7), 373. doi: 10.3390/atmos10070373.