Adjusted K Nearest Neighbor Method Based on Decile Mean in Missing Data Imputation


  • Patchana Suwannasaen College of Research Methodology and Cognitive Science, Burapha University.
  • Pattrawadee Makmee College of Research Methodology and Cognitive Science, Burapha University.
  • Afifi Lateh Faculty of Education, Prince of Songkla University.


missing data imputation, missing data, adjusted K Nearest Neighbor method


The objective of this research was to develop a new method for missing data estimation by using Decile Mean K-Nearest Neighbor Bhattacharyya Imputation (DKNN-BH). This method evaluated the missing data by K-Nearest Neighbor Imputation (KNN) from fine-tuning of the decile mean and Bhattacharyya distance to compare the effectiveness of the new missing data estimation with Mean Imputation (MI), K-Nearest Neighbor Imputation (KNN) and Decile Mean K-Nearest Neighbor Imputation (DKNN) methods. The Monte Carlo simulation was implemented for 300 cases with 4 options : sample size, level of missing data, size of outliers, and k constants for new missing data DKNN-BH, KNN and DKNN methods. Each situation was replicated 500 times. The results showed that the new developed missing data estimation method, DKNN-BH derived from the fine tuning of KNN using Decile Mean and Bhattacharyya distance. There were 2 steps of DKNN-BH: calculation of Bhattacharyya distance and estimation of missing data using Decile Mean method. After comparing the efficacy of both data missing estimation methods from simulation results, the new method (DKNN-BH) was better than the old one in all cases by using the lowest mean square error. The simulation results also revealed that when the percentage of missing data were 5, 10, 20, 30 and 40, the percentage of outliers were 0, 5, 10, 20 and k constant values were 11, 13, 15, 17, and 19 respectively, the lowest mean square error will decrease as the percentage of outliers and k constants decrease.

Author Biographies

