A Performance Assessment of Repeated Jenks Natural Breaks Classification on Univariate Data
Main Article Content
Abstract
Jenks natural breaks classification is a data clustering method that is widely used. This research uses a modified version of Jenks natural breaks classification by increasing the number of groups used for clustering until the change of the first break is less than the specified percentage compared to the previous clustering. The first break is then used to split the data into two groups. We perform a performance assessment of repeated Jenks natural breaks classification against Jenks natural breaks classification, head/tail break, and EM algorithm using 2-group normal mixture distribution and 2-group log-normal mixture distribution univariate simulated data. The performance is asserted by using clustering accuracy. The research found that repeated Jenks natural breaks classification is not suitable for maximizing the overall accuracy of the normal mixture distribution but can be used for log-normal mixture distribution if the proportion of each group is relatively equal or higher-mean group leaning. Repeated Jenks natural breaks classification can also be used if users need to prioritize the accuracy of the higher-mean group.