การเปรียบเทียบประสิทธิภาพการทำนายผลการจำแนกกรณีข้อมูลสูญหายด้วยเทคนิคการทำเหมืองข้อมูล

จิตกานต์ จันทราช; มนทิราลัย ชัยมงคล; รัตนชัย แซ่โง้ว; สายทิพย์ พลอยสัมฤทธิ์; สายชล สินสมบูรณ์ทอง

doi:10.14456/tjst.2020.2

PDF

Published: Dec 16, 2019

DOI: https://doi.org/10.14456/tjst.2020.2

Keywords:

missing data K-nearest neighbor decision tree artificial neural network support vector machine

จิตกานต์ จันทราช

มนทิราลัย ชัยมงคล

รัตนชัย แซ่โง้ว

สายทิพย์ พลอยสัมฤทธิ์

สายชล สินสมบูรณ์ทอง

Abstract

The objective of this research was to compare the efficiencies of four classification methods: K-nearest neighbor, decision tree, artificial neural network and support vector machine, on three datasets with some missing data. The tested datasets, i.e. a dataset of incidents of liver disease in Andhra Pradesh, India, a dataset of annual incomes and expenditures of Filipino families, and a dataset of issued and non-issued credit cards by a bank data points were constructed to replace the missing data by five replacement methods: series mean, mean of nearby points, median of nearby points, linear interpolation and linear trend at a point, offered in SPSS software program. The metrics that indicated the efficiency of a classification method were the prediction accuracy and the mean squared error of classification. Each dataset was divided into three subsets: a learning set, a validation set and a test set, at a ratio of 70 : 20 : 10. For the classification of the dataset of incidents of liver disease in Andhra Pradesh, it had missing data 1.89 percentages and had the least amount of missing data. The most accurate outcomes were from the highest mean of precision for the outcomes and the lowest mean of mean squared error were from the artificial neural network method with missing data replaced by the mean of nearby points method. For the classification of the dataset of annual incomes and expenses of Filipino families, it had missing data 4.21 percentages and had a moderate amount of missing data. The most accurate outcomes were from the artificial neural network method with missing data replaced by the linear interpolation method. For the classification of the dataset of issued and non-issued credit cards by a bank, it had missing data 9.72 percentages and had the highest amount of missing data. The most accurate outcomes were from the artificial neural network method with missing data replaced by the series mean method.

How to Cite

จันทราช จ., ชัยมงคล ม., แซ่โง้ว ร., พลอยสัมฤทธิ์ ส., & สินสมบูรณ์ทอง ส. (2019). การเปรียบเทียบประสิทธิภาพการทำนายผลการจำแนกกรณีข้อมูลสูญหายด้วยเทคนิคการทำเหมืองข้อมูล. Thai Journal of Science and Technology, 9(1), 1–15. https://doi.org/10.14456/tjst.2020.2

Issue

Vol. 9 No. 1 (2020): January-February

Section

วิทยาศาสตร์กายภาพ

บทความที่ได้รับการตีพิมพ์เป็นลิขสิทธิ์ของคณะวิทยาศาสตร์และเทคโนโลยี มหาวิทยาลัยธรรมศาสตร์ ข้อความที่ปรากฏในแต่ละเรื่องของวารสารเล่มนี้เป็นเพียงความเห็นส่วนตัวของผู้เขียน ไม่มีความเกี่ยวข้องกับคณะวิทยาศาสตร์และเทคโนโลยี หรือคณาจารย์ท่านอื่นในมหาวิทยาลัยธรรมศาสตร์ ผู้เขียนต้องยืนยันว่าความรับผิดชอบต่อทุกข้อความที่นำเสนอไว้ในบทความของตน หากมีข้อผิดพลาดหรือความไม่ถูกต้องใด ๆ

Author Biographies