Time series data enrichment using semantic information for dengue incidence forecasting
Main Article Content
Abstract
Forecasting the incidence of dengue diseases as time series models facilitates public health anticipation and preparation for managing an outbreak and reducing morbidity. Previous works have indicated that many potential predictors are significant factors for improving the accuracy and effectiveness of the prediction. However, these factors are usually used as dependent variables and are rarely used to identify and utilize data relationships in time series approaches. Therefore, the purpose of this study was to enrich time series data with semantic information and knowledge from a dengue fever ontology model, in order to improve the capability of time series methods to forecast dengue incidence in the provinces of Thailand. In this paper, a new technique, named auto regressive integrated moving average (ARIMA) with semantic data (ARIMAS) was introduced and compared with classical time series approaches such as ARIMA and ARIMAX. The root mean squared error (RMSE) of ARIMA and ARIMAX was 25.97 and 27.45, respectively, whereas that of ARIMAS was 24.29. The results showed that the predicted values of ARIMAS were closer to the observed data than the values obtained from traditional time series techniques. In addition, the forecast performance of unusual periods with fluctuant incidence improved significantly. In 2013 and 2015, the RMSE of ARIMAS was 39.78 and 48.09, respectively, whereas ARIMA had an RMSE of 61.32 and 71.66, respectively, both years witnessed a large epidemic of dengue fever and have been explored in previous studies.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
Anggraeni, W., and Aristiani, L. (2016). Using Google trend data in forecasting number of dengue fever cases with ARIMAX method case study: Surabaya, Indonesia. In Proceedings of the 2016 International Conference on Information Communication Technology and Systems, pp. 114-118. Surabaya, Indonesia.
Benedum, C. M., Shea, K. M., Jenkins, H. E., Kim, L. Y., and Markuzon, N. (2020). Weekly dengue forecasts in Iquitos, Peru; San Juan, Puerto Rico; and Singapore. PLOS Neglected Tropical Diseases, 14(10), e0008710.
Box, G. E. P., Jenkins, G. M., Reinsel, G. C., and Ljung, G. M. (2015). Time Series Analysis: Forecasting and Control (5th edition), Milton, Queensland: John Wiley & Sons, Inc., pp. 64-66.
Chakraborty, T., Chattopadhyay, S., and Ghosh, I. (2019). Forecasting dengue epidemics using a hybrid methodology. Physica A: Statistical Mechanics and its Applications, 527, 121266.
Cortes, F., Turchi Martelli, C. M., Arraes de Alencar Ximenes, R., Montarroyos, U. R., Siqueira Junior, J. B., Gonçalves Cruz, O., Alexander, N., and Vieira de Souza, W. (2018). Time series analysis of dengue surveillance data in two Brazilian cities. Acta Tropica, 182, 190-197.
Dickey, D. A., and Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366), 427-431.
Guo, P., Liu, T., Zhang, Q., Wang, L., Xiao, J., Zhang, Q., Luo, G., Li, Z., He, J., Zhang, Y., and Ma, W. (2017). Developing a dengue forecast model using machine learning: A case study in China. PLOS Neglected Tropical Diseases, 11(10), e0005973.
Herdiani, A., Fitria, L., Hayurani, H., Wibowo, W., and Sungkar, S. (2012). Hierarchical conceptual schema for dengue hemorrhagic fever ontology. International Journal of Computer Science, 9(4), 53-58.
Jing, Q. L., Cheng, Q., Marshall, J. M., Hu, W. B., Yang, Z. C., and Lu, J. H. (2018). Imported cases and minimum temperature drive dengue transmission in Guangzhou, China: Evidence from ARIMAX model. Epidemiology & Infection, 146(10), 1226-1235.
Johansson, M. A., Cummings, D. A. T., and Glass, G. E. (2009). Multiyear climate variability and dengue—El Niño southern oscillation, weather, and dengue incidence in Puerto Rico, Mexico, and Thailand: A longitudinal data analysis. PLOS Medicine, 6(11), e1000168.
Leacock, C., and Chodorow, M. (1998). Combining local context and wordnet similarity for word sense identification. In WordNet: An Electronic Lexical Database (Fellbaum, C., and Miller, G., eds.), pp. 265-283. Massachusetts: MIT Press.
Lu, L., Lin, H., Tian, L., Yang, W., Sun, J., and Liu, Q.-Y. (2009). Time series analysis of dengue fever and weather in Guangzhou, China. BMC Public Health, 9(1), 395.
Mitraka, E., Topalis, P., Dritsou, V., Dialynas, E., and Louis, C. (2015). Describing the breakbone fever: IDODEN, an ontology for dengue fever. PLOS Neglected Tropical Diseases, 9(2), e0003479.
Nagao, Y., Thavara, U., Chitnumsup, P., Tawatsin, A., Chansang, C., and Campbell‐Lendrum, D. (2003). Climatic and social risk factors for Aedes infestation in rural Thailand. Tropical Medicine & International Health, 8(7), 650-659.
Nayak, M. S. D. P., and Narayan, K. A. (2019). Forecasting dengue fever incidence using ARIMA analysis. International Journal of Collaborative Research on Internal Medicine and Public Health, 11(6), 924-932.
Nguyen, N. (2018). Predicting dengue spread using seasonal ARIMAX model and meteorological data. Towards Data Science. [Online URL: https://towardsdatascience.com/predicting-dengue-spread-using-seasonal-arimax-model-on-meteorology-data-3f35979ec5d] accessed on September 13, 2018.
Polwiang, S. (2020). The time series seasonal patterns of dengue fever and associated weather variables in Bangkok (2003-2017). BMC Infectious Diseases, 20(1), 208.
Siriyasatien, P., Phumee, A., Ongruk, P., Jampachaisri, K., and Kesorn, K. (2016). Analysis of significant factors for dengue fever incidence prediction. BMC Bioinformatics, 17, 166.
Somboonsak, P. (2019). Forecasting dengue fever epidemics using ARIMA model. In Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing Conference, pp. 144-150, Kobe, Japan.
Thiruchelvam, L., Dass, S. C., Zaki, R., Yahya, A., and Asirvadam, V. S. (2018). Correlation analysis of air pollutant index levels and dengue cases across five different zones in Selangor, Malaysia. Geospatial Health, 13(1), 102-109.
World Health Organization. (2020). Dengue and severe dengue. [Online URL: https://www.who.int/news-room/fact-sheets/detail/dengue-and-severe-dengue] accessed on June 23, 2018.
Xu, J., Xu, K., Li, Z., Meng, F., Tu, T., Xu, L., and Liu, Q.-Y. (2020). Forecast of dengue cases in 20 Chinese cities based on the deep learning method. International Journal of Environmental Research and Public Health, 17, 453.
Zhao, N., Charland, K., Carabali, M., Nsoesie, E. O., Maheu-Giroux, M., Rees, E., Yuan, M., Balaguera, C. G., Ramirez, G. J., and Zinszer, K. (2020). Machine learning and dengue forecasting: Comparing random forests and artificial neural networks for predicting dengue burden at national and sub-national scales in Colombia. PLOS Neglected Tropical Diseases, 14(9), e0008056.