Comparison of Clustering Techniques for Cluster Analysis
Keywords:
cluster analysis, multivariate data, Kohonen’s Self-Organizing Maps, K-medoids, Dynamic Time Warping, K-meansAbstract
Cluster analysis is important for analyzing the number of clusters of natural data in several domains. Various clustering methods have been proposed. However, it is very difficult to choose the method best suited to the type of data. Therefore, the objective of this research was to compare the effectiveness of five clustering techniques with multivariate data. The techniques were: hierarchical clustering method; K-means clustering algorithm; Kohonen’s Self-Organizing Maps method (SOM); K-medoids method; and K-medoids method integrated with Dynamic Time Warping distance measure (DTW). To evaluate these five techniques, the root mean square standard deviation (RMSSTD) and r2 (RS) were used. For RMSSTD, a lower value indicates a better technique and for RS, a higher value indicates a better technique. These approaches were evaluated using both real and simulated data which were multivariate normally distributed. Each dataset was generated by a Monte Carlo technique with 100 sample sizes and repeated 1,000 times for 3, 5 and 7 variables. In this research, 2, 3, 4, 5, 6, 7 and 8 clusters were studied. Both real and simulated datasets provided the same result, with the K-means clustering method having the closest RMSSTD and RS results to the SOM method. These two methods
yielded the lowest RMSSTD and highest RS in all simulations. Hence, both K-means and SOM were considered to be the most suitable techniques for cluster analysis.
Downloads
Published
How to Cite
Issue
Section
License
online 2452-316X print 2468-1458/Copyright © 2022. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/),
production and hosting by Kasetsart University of Research and Development Institute on behalf of Kasetsart University.