การเปรียบเทียบประสิทธิภาพในการทำนายผลความไม่สมดุลของข้อมูลในการจำแนกด้วยเทคนิคการทำเหมืองข้อมูล
Main Article Content
Abstract
In this study, an efficiency comparison in prediction of imbalanced data classification with data mining techniques was compared. The seven classification methods were the following: (1) k-nearest neighbor method using IBk algorithm; (2) decision tree method using J48 algorithm; (3) neural network method using multilayer perceptron algorithm; (4) support vector machine method using polynomial kernel; (5) rule-based method using decision table algorithm; (6) binary logistic regression method; and (7) naïve Bayes method. The following efficiency comparison of classification were employed: accuracy, sensitivity, specificity, time and mean square error (MSE) using fertility, vertibral volumn and diabetes data set. The important results are as follows. The binary logistic regression method using random seed = 10, 20 and 30 showed the best accuracy, sensitivity, specificity, and MSE at 100 %, 1.0000, 1.0000 and 0.00000 respectively for fertility data set. The k-nearest neighbor method using random seed = 10, 20 and 30 showed the best accuracy, sensitivity, specificity, and MSE at 100 %, 1.0000, 1.0000 and 0.00024 respectively for vertebral column data set. The k-nearest neighbor method using random seed = 10, 20 and 30 showed the best accuracy, sensitivity, specificity, and MSE at 100 %, 1.0000, 1.0000 and 0.00004 respectively for diabetes data set. In the three data sets, the k-nearest neighbor method offered the best prediction method.
Article Details
References
[2] Boonchuay, K., Sinapiromsaran, K. and Lursinsap, C., 2011, Minority split and gain ratio for a class imbalance, Int. Conf. Fuz. Sys. Knowl. Disc. 8: 2060-2064.
[3] Akbani, R., Kwek, S. and Japkowicz, N., 2004, Applying support vector machines to imbalanced datasets. Eur. Conf. Mach. Learn. 32: 39-50.
[4] Chen, Y., 2009, Learning Classifiers from Imbalanced, Only Positive and Unlabelled Data Sets, Project Report for UC San Diego Data Mining Contest, Department of Computer Science, Iowa State University, Iowa, 78 p.
[5] Sobran, N.M.M., Ahmad, A. and Ibrahim, Z., 2013, Classification of imbalanced dataset using conventional Naïve Bayes classifier, Int. Conf. Artif. Intell. Comput. Sci. 10: 35-42.
[6] Zhang, S., Sadaoui, S. and Mouhoub, M., 2015, An empirical analysis of imbalanced data classification, J. Comp. Inform. Sci. 8: 151-162.
[7] Panichkul, P., 2005, Development Data Mining System by Decision Tree, Work System Development Project, Master Thesis, King Montkut’s Institute of Technology Ladkrabang, Bangkok, 62 p. (in Thai)
[8] Wu, X. and Kumar, V., 2009, The Top Ten Algorithms in Data Mining, Department of Computer Science and Engineering, University of Minnesota, CRC Press, Minneapolis, 215 p.
[9] Thammasombut, R., 2012, Decision Support System for Selection the Mobile Internet Package Using Decision Tree, Major of Business Computer, Faculty of Business Administration, Rajapruek College, Sakon Nakhon, 77 p. (in Thai)
[10] Berson, A. and Smith, S.J., 1997, Data Warehousing, Data Mining, and OLAP, McGraw-Hill, New York, 612 p.
[11] Nuipian, V., 2010, Comparison of Efficiency and Analysis of Data Classification using Artificial Neural Network, Support Vector Machine, Naïve Bayes and k-Nearest Neighbor, Department of Information Technology, Faculty of Information Technology, King Montkut’s University of Technology North Bangkok, Bangkok, 85 p. (in Thai)
[12] Murti, S. and Mahantappa, M., 2012, Using rule based classifiers for the predictive analysis of breast cancer recurrence, J. Inform. Eng. Appl. 2(2): 12-19.
[14] Vanichbuncha, K., 2009, Multivariate Analysis, Thammasan Co., Ltd., Bangkok, 589 p. (in Thai)
[15] Sinsomboonthong, S., 2017, Data Mining 1: Discovering Knowledge in Data, 2nd Ed., Chamchuree Products Co., Ltd., Bangkok, 512 p. (in Thai)