An Effect of Window Size in Missing Value Imputation for Multivariate Time Series Data
Keywords:
Imputation, Sliding window, Multivariate dataAbstract
Insufficient or incomplete data collection cannot be used to analyze effectively. It is mandatory that the missing values must be taken care of. In the imputation process, data windowing is used for moving existing data to prediction-missing ones. This research is to study the impact of changing the data window size for missing values imputation in multivariate time series data. The research focuses on finding suitable the data window size for using to train data to imputation the missing values and import data into missing imputation values using four techniques: Row Average, K-Nearest Neighbor (KNN), Fuzzy Logic System and Artificial Neural Network (ANN). The data used in the experiment were the water level above the dam and end the dam 12 variables. The research methodology started with missing data randomization, imputation models creation, and model performance evaluation. The window sizes in the experiment were 3 days, 7 days, and 14 days past period. The results found that appropriate window size was 14 days past period. The prediction model yielded best performance of missing value imputation on test data when 14 days past period data were applied for missing value imputation
References
2. Bai X, Zhang F, Hou J, Xia F, Tolba A, Elashkar E. “Implicit Multi-Feature Learning for Dynamic Time Series Prediction of the Impact of Institutions,” IEEE Access, vol. 5, 16372–16382, 2017.
3. Pratama I, Permanasari AE, Ardiyanto I, Indrayani R. “A review of missing values handling methods on time-series data,” in 2016 International Conference on Information Technology Systems and Innovation (ICITSI), 2016, 1–6.
4. Afrianti YS. “Imputation Algorithm Based on Copula for Missing,” vol. 2014, 252–257.
5. Meesud P, Rothjanawan K. “The Imputation Many Missing Value in Time Series Data Use Multivariate Relationships,” Natl. Conf. Inf. Technol., vol. 2017, no. 9, Nov. 2017.
6. Nogueira BM, Santos TRA, Zarate LE. “Comparison of Classifiers Efficiency on Missing Values Recovering: Application in a Marketing Database with Massive Missing Data,” in 2007 IEEE Symposium on Computational Intelligence and Data Mining, 2007, 66–72.
7. Zhou Y, Yu J, Wang X. “Time Series Prediction Methods for Depth-Averaged Current Velocities of Underwater Gliders,” IEEE Access, vol. 5, 5773–5784, 2017.
8. Wang H, Yang J, Wang Z, Wang Q. “A binary granular algorithm for spatiotemporal meteorological data mining,” in 2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM), 2015, 5–11.
9. Dhargalkar SA, Bapat AU. “Determining missing values in dimension incomplete databases using spatial-temporal correlation techniques,” in 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies, 2014, 601–606.
10. Shan Y, Deng G. “Kernel PCA regression for missing data estimation in DNA microarray analysis,” in 2009 IEEE International Symposium on Circuits and Systems, 2009, 1477–1480.
11. Li Y, Ngom A, Rueda L. “Missing value imputation methods for gene-sample-time microarray data analysis,” in 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 2010, 1–7.
12. Keerin P, Kurutach W, Boongoen T. “Cluster-based KNN missing value imputation for DNA microarray data,” in 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2012, 445–450.
13. Ichihashi H, Honda K, Notsu A, Yagi T. “Fuzzy c-Means Classifier with Deterministic Initialization and Missing Value Imputation,” in 2007 IEEE Symposium on Foundations of Computational Intelligence, 2007, 214–221.
14. Setiawan NA, Venkatachalam PA, Hani AFM. “Missing Attribute Value Prediction Based on Artificial Neural Network and Rough Set Theory,” in 2008 International Conference on BioMedical Engineering and Informatics, 2008, vol. 1, 306–310.
15. Ku-Mahamud KR, Zakaria N, Katuk N, Shbier M. “Flood Pattern Detection Using Sliding Window Technique,” in 2009 Third Asia International Conference on Modelling Simulation, 2009, 45–50.
16. Doreswamy, Gad I, Manjunatha BR. “Performance evaluation of predictive models for missing data imputation in weather data,” in 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017, 1327–1334.
17. Santos CB. dos, Pedroso B, Guimaraes AM, Carvalho DR, Pilatti LA. “Forecasting of Human Development Index of Latin American Countries Through Data Mining Techniques,” IEEE Lat. Am. Trans., vol. 15, no. 9, 1747–1753, 2017.
18. Susanti SP, Azizah FN. “Imputation of missing value using dynamic Bayesian network for multivariate time series data,” in 2017 International Conference on Data and Software Engineering (ICoDSE), 2017, 1–5.