Adjusted K Nearest Neighbor Method Based on Decile Mean in Missing Data Imputation
Main Article Content
Abstract
The objective of this research was to develop a new method for missing data estimation by using Decile Mean K-Nearest Neighbor Bhattacharyya Imputation (DKNN-BH). This method evaluated the missing data by K-Nearest Neighbor Imputation (KNN) from fine-tuning of the decile mean and Bhattacharyya distance to compare the effectiveness of the new missing data estimation with Mean Imputation (MI), K-Nearest Neighbor Imputation (KNN) and Decile Mean K-Nearest Neighbor Imputation (DKNN) methods. The Monte Carlo simulation was implemented for 300 cases with 4 options : sample size, level of missing data, size of outliers, and k constants for new missing data DKNN-BH, KNN and DKNN methods. Each situation was replicated 500 times. The results showed that the new developed missing data estimation method, DKNN-BH derived from the fine tuning of KNN using Decile Mean and Bhattacharyya distance. There were 2 steps of DKNN-BH: calculation of Bhattacharyya distance and estimation of missing data using Decile Mean method. After comparing the efficacy of both data missing estimation methods from simulation results, the new method (DKNN-BH) was better than the old one in all cases by using the lowest mean square error. The simulation results also revealed that when the percentage of missing data were 5, 10, 20, 30 and 40, the percentage of outliers were 0, 5, 10, 20 and k constant values were 11, 13, 15, 17, and 19 respectively, the lowest mean square error will decrease as the percentage of outliers and k constants decrease.
Article Details
The content and information in the article published in Journal of Rajamangala University of Technology Srivijaya It is the opinion and responsibility of the author of the article. The editorial journals do not need to agree. Or share any responsibility.
References
Bhattacharyya, A. 1943. On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society 35: 99-109.
Bishop, C.M. 1995. Neural networks for pattern recognition. Oxford university press, UK.
Cartwright, M.H., Shepperd, M.J. and Song, Q. 2003. Dealing with missing software project data, pp. 154-165. In Proceedings of the 9th IEEE International Software Metrics Symposium (METRICS'03). IEEE Computer Society, Sydney.
Hengpraprohm, K. and Meesad, P. 2008. Feature selection of K-Nearest Neighbor for missing value imputation using K-Nearest Neighbor. Information Technology Journal 4(7): 55-61. (in Thai)
Kim, J.O. and Curry, J. 1977. The treatment of missing data in multivariate analysis. Sociological Methods & Research 6(2): 215-240.
Kim, K.Y., Kim, B.J. and Yi, G.S. 2004. Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinformatics 5(1): 160.
Ladha, L. and Deepa, T. 2011. Feature selection methods and algorithms. International journal on computer science and engineering 3(5): 1787-1797.
Liao, S.G., Lin, Y., Kang, D.D., Chandra, D., Bon, J., Kaminski, N. and Tseng, G.C. 2014. Missing value imputation in high-dimensional phenomic data: imputable or not, and how?. BMC Bioinformatics 15(1): 346.
Pasunon, P. and Nilakorn, P. 2007. Outliers detection in regression analysis by Bhattacharyya Statistics, pp. 11-18. In The Proceeding of 45th Kasetsart University Annual Conference. Kasetsart University, Bangkok. (in Thai)
Rana, S., Siraj-Ud-Doulah, M., Midi, H. and Imon, A.H.M.R. 2012. Decile mean: A new robust measure of central tendency. Chiang Mai journal of science 39(3): 478-485.
Robins, J.M. and Wang, N. 2000. Inference for imputation estimators. Biometrika 87: 113-124.
Schioler, H. and Hartmann, U. 1992. Mapping neural network derived from the Parzen window estimator. Neural Networks 5(6): 903-909.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R. and Altman, R.B. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics 17(6): 520-525.
Vongprasert, J. 2019. Jacknife and Regression Approaches to Missing Data Imputation. Journal of Applied Statistics and Information Technology 3(1): 52-61.