@article{ผลจันทร์_ใจมีธรรม_สินสมบูรณ์ทอง_2020, place={Pathumthani, Thailand}, title={การเปรียบเทียบประสิทธิภาพวิธีการจัดกลุ่มเมื่อข้อมูลมีค่านอกเกณฑ์ในการทำเหมืองข้อมูล}, volume={9}, url={https://li01.tci-thaijo.org/index.php/tjst/article/view/248267}, DOI={10.14456/tjst.2020.64}, abstractNote={<p>Our research objective was to evaluate an efficacy of different types of hierarchical and non-hierarchical clustering methods on five well-known data sets with different qualities and quantities of outliers. Each of the three types of the hierarchical clustering method adopted the different linkage criteria. i.e. single-linkage, complete-linkage, or average-linkage clustering. Each type could use any of three different metrics: Euclidean, Manhattan, or Chebyshev Distances. The non-hierarchical clustering method performed k-means clustering analysis employing one of two metrics: Euclidean or Manhattan distances. All data sets were pre-processed with WEKA software and their outliers detected with SPSS software. The five data sets were a heart disease data set (with 1.39 % outliers), a breast cancer (2.28 %), a cardiovascular disease (3.43 %), a diabetes (4.02 %), and an insurance claim (5.53 %) data set by SPSS software for outlier detection. The two clustering methods were run on the five data sets, and their clustering accuracy values were evaluated. A type of hierarchical and non-hierarchical clustering methods was chosen as the most efficacy for a particular data set type for that respective method according to its clustering accuracy. For hierarchical clustering method, the most efficacy clustering type for cardiovascular disease, diabetes, and insurance claim data sets was the single-linkage clustering type; the most efficacy type for heart disease and breast cancer data sets was the average-linkage clustering type; the most efficacy metric for heart disease, cardiovascular disease, and diabetes data sets was Manhattan distance; the most efficacy metric for breast cancer data set was Euclidean distance; the most efficacy metric for insurance claim data set was Chebyshev distance. For non-hierarchical clustering method performed k-means clustering analysis, the most efficacy metric for breast cancer, cardiovascular disease, and insurance claim data sets was Euclidean distance; the most efficacy metric for heart disease and diabetes data sets was Manhattan distance.</p>}, number={5}, journal={Thai Journal of Science and Technology}, author={ผลจันทร์ ณัฐวรรณ and ใจมีธรรม ปาริฉัตร and สินสมบูรณ์ทอง สายชล}, year={2020}, month={Dec.}, pages={589–602} }