การเปรียบเทียบประสิทธิภาพในการทำนายผลความไม่สมดุลของข้อมูลในการจำแนกด้วยเทคนิคการทำเหมืองข้อมูล

Main Article Content

สายชล สินสมบูรณ์ทอง

Abstract

In this study, an efficiency comparison in prediction of imbalanced data classification with data mining techniques was compared. The seven classification methods were the following: (1) k-nearest neighbor method using IBk algorithm; (2) decision tree method using J48 algorithm; (3) neural network method using multilayer perceptron algorithm; (4) support vector machine method using polynomial kernel; (5) rule-based method using decision table algorithm; (6) binary logistic regression method; and (7) naïve Bayes method. The following efficiency comparison of classification were employed: accuracy, sensitivity, specificity, time and mean square error (MSE) using fertility, vertibral volumn and diabetes data set. The important results are as follows. The binary logistic regression method using random seed = 10, 20 and 30 showed the best accuracy, sensitivity, specificity, and MSE at 100 %, 1.0000, 1.0000 and 0.00000 respectively for fertility data set. The k-nearest neighbor method using random seed = 10, 20 and 30 showed the best accuracy, sensitivity, specificity, and MSE at 100 %, 1.0000, 1.0000 and 0.00024 respectively for vertebral column data set. The k-nearest neighbor method using random seed = 10, 20 and 30 showed the best accuracy, sensitivity, specificity, and MSE at 100 %, 1.0000, 1.0000 and 0.00004 respectively for diabetes data set. In the three data sets, the k-nearest neighbor method offered the best prediction method.

Article Details

Section
Physical Sciences
Author Biography

สายชล สินสมบูรณ์ทอง

ภาควิชาสถิติ คณะวิทยาศาสตร์ สถาบันเทคโนโลยีพระจอมเกล้าเจ้าคุณทหารลาดกระบัง ถนนฉลองกรุง เขตลาดกระบัง กรุงเทพมหานคร 10520

References

[1] Cao, P., Zhao, D. and Zaiane, O., 2013, An optimized cost-sensitive SVM for imbalanced data learning, Int. Conf. Adv. Knowl. Disc. Comp. Sci. 78: 280-292.
[2] Boonchuay, K., Sinapiromsaran, K. and Lursinsap, C., 2011, Minority split and gain ratio for a class imbalance, Int. Conf. Fuz. Sys. Knowl. Disc. 8: 2060-2064.
[3] Akbani, R., Kwek, S. and Japkowicz, N., 2004, Applying support vector machines to imbalanced datasets. Eur. Conf. Mach. Learn. 32: 39-50.
[4] Chen, Y., 2009, Learning Classifiers from Imbalanced, Only Positive and Unlabelled Data Sets, Project Report for UC San Diego Data Mining Contest, Department of Computer Science, Iowa State University, Iowa, 78 p.
[5] Sobran, N.M.M., Ahmad, A. and Ibrahim, Z., 2013, Classification of imbalanced dataset using conventional Naïve Bayes classifier, Int. Conf. Artif. Intell. Comput. Sci. 10: 35-42.
[6] Zhang, S., Sadaoui, S. and Mouhoub, M., 2015, An empirical analysis of imbalanced data classification, J. Comp. Inform. Sci. 8: 151-162.
[7] Panichkul, P., 2005, Development Data Mining System by Decision Tree, Work System Development Project, Master Thesis, King Montkut’s Institute of Technology Ladkrabang, Bangkok, 62 p. (in Thai)
[8] Wu, X. and Kumar, V., 2009, The Top Ten Algorithms in Data Mining, Department of Computer Science and Engineering, University of Minnesota, CRC Press, Minneapolis, 215 p.
[9] Thammasombut, R., 2012, Decision Support System for Selection the Mobile Internet Package Using Decision Tree, Major of Business Computer, Faculty of Business Administration, Rajapruek College, Sakon Nakhon, 77 p. (in Thai)
[10] Berson, A. and Smith, S.J., 1997, Data Warehousing, Data Mining, and OLAP, McGraw-Hill, New York, 612 p.
[11] Nuipian, V., 2010, Comparison of Efficiency and Analysis of Data Classification using Artificial Neural Network, Support Vector Machine, Naïve Bayes and k-Nearest Neighbor, Department of Information Technology, Faculty of Information Technology, King Montkut’s University of Technology North Bangkok, Bangkok, 85 p. (in Thai)
[12] Murti, S. and Mahantappa, M., 2012, Using rule based classifiers for the predictive analysis of breast cancer recurrence, J. Inform. Eng. Appl. 2(2): 12-19.
[14] Vanichbuncha, K., 2009, Multivariate Analysis, Thammasan Co., Ltd., Bangkok, 589 p. (in Thai)
[15] Sinsomboonthong, S., 2017, Data Mining 1: Discovering Knowledge in Data, 2nd Ed., Chamchuree Products Co., Ltd., Bangkok, 512 p. (in Thai)