Feature Selection Methods for Imputation Missing Values of Time Series Data using Data Mining

Authors

  • Kronsirinut Rothjanawan มหาวิทยาลัยนราธิวาสราชนครินทร์
  • Wiyuda Phetjirachotkul

Keywords:

Feature Selection, Data Imputation, Missing Values, Time Series Data, Data Mining

Abstract

This research proposes a feature selection method for time series data of Royal Irrigation Department, Royal Thai Army, over a period of 5 year consisting of 12 variables. The proposed feature selection is a voting scheme based on 5 techniques: Principal Components Analysis (PCA), Correlation-based Feature Selection (CFS), ReliefF algorithm (ReliefF), Gain Ratio (GR), and Information Gain (IG) Multilayer Perceptron neural network was used as the missing values imputation model. To test the efficiency of the proposed method, the researchers used the complete data to randomly force the data to be missing for 5, 10, 15, 20, 25 and 30%, respectively. From the experiments, 9 out of 12 variables that are variables 1, 2, 3, 4, 5, 6, 7, 8 and 10, were selected. In addition, 10 Multilayer Perceptron neural network models that are 9-3-1, 9-5-1, 9-10-1, 9-15-1, 9-20-1, 9-25-1, 9-30-1, 9-35-1, 9-40-1 and 9-45-1 (inputs-hidden neurons-outputs) were used in the experiments. Using 10-fold-cross-validation, the best performance was the 9-30-1 model, yielding the lowest MSE equaled 0.669.

References

Albayrak, M., Turhan, K., & Kurt, B. (2017). A missing data imputation approach using clustering and maximum likelihood estimation. 2017 Medical Technologies National Congress (TIPTEKNO), 1–4.

Caparino, E. T., Sison, A. M., & Medina, R. P. (2018). A Modified Imputation Method to Missing Data as a Preprocessing Technique. 2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), 1–6.

Duong, T. V., & Tran, D.Q. (2015). A fusion of data mining techniques for predicting movement of mobile users. Journal of Communications and Networks, 17(6), 568–581.

Jangyodsuk, P., Seo, D. J., Elmasri, R., & Gao, J. (2015). Flood Prediction and Mining Influential Spatial Features on Future Flood with Causal Discovery. 2015 IEEE International Conference on Data Mining Workshop (ICDMW), 1462–1469.

Khairalla, M., Ning, X., & AL-Jallad, N. (2018). Modelling and optimisation of effective hybridisation model for time-series data forecasting. The Journal of Engineering, 2018(2), 117–122.

Li, L., Zhang, J., Wang, Y., & Ran, B. (2017). Multiple imputation for incomplete traffic accident data using chained equations. 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), 1–5.

Li, X., Li, G., & Fishbune, R. (2016). A Novel Missing-Rate-Oriented Selective Algorithm for Handling Missing Data by Minimizing Imputation. 2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 234–237.

Liu, Z., Zhang, W., Quek, T. Q. S., & Lin, S. (2017). Deep fusion of heterogeneous sensor data. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5965–5969.

Albayrak, M., Turhan, K., & Kurt, B. (2017). A missing data imputation approach using clustering and maximum likelihood estimation. 2017 Medical Technologies National Congress (TIPTEKNO), 1–4.

Caparino, E. T., Sison, A. M., & Medina, R. P. (2018). A Modified Imputation Method to Missing Data as a Preprocessing Technique. 2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), 1–6.

Duong, T. V., & Tran, D.Q. (2015). A fusion of data mining techniques for predicting movement of mobile users. Journal of Communications and Networks, 17(6), 568–581.

Jangyodsuk, P., Seo, D. J., Elmasri, R., & Gao, J. (2015). Flood Prediction and Mining Influential Spatial Features on Future Flood with Causal Discovery. 2015 IEEE International Conference on Data Mining Workshop (ICDMW), 1462–1469.

Khairalla, M., Ning, X., & AL-Jallad, N. (2018). Modelling and optimisation of effective hybridisation model for time-series data forecasting. The Journal of Engineering, 2018(2), 117–122.

Li, L., Zhang, J., Wang, Y., & Ran, B. (2017). Multiple imputation for incomplete traffic accident data using chained equations. 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), 1–5.

Li, X., Li, G., & Fishbune, R. (2016). A Novel Missing-Rate-Oriented Selective Algorithm for Handling Missing Data by Minimizing Imputation. 2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 234–237.

Liu, Z., Zhang, W., Quek, T. Q. S., & Lin, S. (2017). Deep fusion of heterogeneous sensor data. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5965–5969.

Ma, Q., Li, S., & Cottrell, G. (2020). Adversarial Joint-Learning Recurrent Neural Network for Incomplete Time Series Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1.

Makaba, T., & Dogo, E. (2019). A Comparison of Strategies for Missing Values in Data on Machine Learning Classification Algorithms. 2019 International Multidisciplinary Information Technology and Engineering Conference (IMITEC), 1–7.

Pasha, S. J., & Mohamed, E. S. (2020). Ensemble Gain Ratio Feature Selection (EGFS) Model with Machine Learning and Data Mining Algorithms for Disease Risk Prediction. 2020 International Conference on Inventive Computation Technologies (ICICT), 590–596.

Sim, S., Bae, H., & Choi, Y. (2019). Likelihood-based Multiple Imputation by Event Chain Methodology for Repair of Imperfect Event Logs with Missing Data. 2019 International Conference on Process Mining (ICPM), 9–16.

Wan, D., Xiao, Y., Zhang, P., & Leung, H. (2015). Hydrological Big Data Prediction Based on Similarity Search and Improved BP Neural Network. 2015 IEEE International Congress on Big Data, 343–350.

Wang, H., Yang, J., Wang, Z., & Wang, Q. (2015). A binary granular algorithm for spatiotemporal meteorological data mining. 2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM), 5–11.

Widiasari, I. R., Nugroho, L. E., & Widyawan. (2017). Deep learning multilayer perceptron (MLP) for flood prediction model using wireless sensor networkbased hydrology time series data mining. 2017 International Conference on Innovative and Creative Information Technology (ICITech), 1–5.

Wu, Z., Zhou, Y., & Wang, H. (2020). Real-Time Prediction of the Water Accumulation Process of Urban Stormy Accumulation Points Based on Deep Learning. IEEE Access, 8, 151938–151951.

Xu, J., & Jiang, H. (2015). An Improved Information Gain Feature Selection Algorithm for SVM Text Classifier. 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 273–276.

Xu, X., Chong, W., Li, S., Arabo, A., & Xiao, J. (2018). MIAEC: Missing Data Imputation Based on the Evidence Chain. IEEE Access, 6, 12983–12992.

Additional Files

Published

2021-05-12

How to Cite

Rothjanawan, K. . ., & Phetjirachotkul, W. (2021). Feature Selection Methods for Imputation Missing Values of Time Series Data using Data Mining. Princess of Naradhiwas University Journal, 13(2), 326–341. Retrieved from https://li01.tci-thaijo.org/index.php/pnujr/article/view/248365

Issue

Section

บทความวิจัย