Comparing K-Mean Clustering Methods of DNA in Brain Tumors for High-Dimensional Data

Main Article Content

Autcha Araveeporn
Jarawee Promsanga

Abstract

This study aims to compare the performance of clustering DNA of brain tumor patients of k-means three methods, namely the Hartigan-Wong, Forgy, and MacQueen methods. The independent variables are DNA as 989 genes, and the dependent variable is the level of a brain tumor in 43 patients. In this case, the number of the independent variable is larger than the number of patients or called the high-dimensional data. The experiment is conducted by random DNA samples of 200, 400, 600, and 800 genes and fixed 5, 10, 15, 20, 25, and 30 groups by 1,000 replications. Comparing clustering performance is the mean data differences between the groups' criteria. The results of k-means clustering methods find that the Hartigan-Wong method has the best performance for all situations. However, the Hartigan-Wong method shows the most significant difference in data between groups compared to Forgy and MacQueen methods. The number of independent variables has not affected clustering performance.

Article Details

How to Cite
Araveeporn, A., & Promsanga, J. (2023). Comparing K-Mean Clustering Methods of DNA in Brain Tumors for High-Dimensional Data. Journal of Science Ladkrabang, 32(2), 67–79. Retrieved from https://li01.tci-thaijo.org/index.php/science_kmitl/article/view/256290
Section
Research article

References

Zarikas, V., Poulopoulos, S.G., Gareiou, Z. and Zervas, E. 2020. Clustering analysis of countries using the COVID-19 cases dataset. Data in Brief, 31, 1-8.

Nurlaila, I., Irawati, W., Purwandari, K. and Pardamean, B. 2021. K-Means Clustering Model to Discriminate Copper-Resistant Bacteria as Bioremediation Agents. Procedia Computer Science, 179, 804-812.

Shan, P. 2018. Image segmentation method based on K-mean algorithm. EURASIP Journal on Image and Video Processing, 81, 1-9.

Hartigan, J.A. and Wong, M.A. 1979. Algorithm AS 136: A K-means Clustering Algorithm. Applied Statistics, 28, 100-108.

MacQueen. J. 1967. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, 281-297.

Forgy, E.W. 1965. Clustering Analysis of Multivariate Data: Efficiency vs Interpretability of Classifications. Biometrics, 21, 768-769.

Lloyd, S.P. 1982. Least Squares Quantization in PCM. IEEE Transactions on Information Theory, 28, 128-137.

Yadav, J. and Sharma, Monika. 2013. A Review of K-mean Algorithm. International Journal of Engineering Trends and Technology, 4(7), 2972-2976.

Singh, R. P. and Rajpoot, D. S. 2019. Efficient Identification of Initial Clusters Centers for Partitioning Clustering Methods. 2019 Fifth International Conference on Image Processing, Shimla, India, 131-136.

อาริกา ธรรมโน, มุทิตา หวังคิด และอาริต ธรรมโน. 2563. การพยากรณ์โรคมะเร็งเต้านมด้วยอัลกอริทึมการจำแนกประเภทแบบเคมีนร่วมกับค่าถ่วงน้ำหนักแบบปรับตัวเอง. วารสารวิทยาการและเทคโนโลยีสารสนเทศ, 10(2), 1-9. [Arika Thammmano, Muthita Wangkid and Arit Thammano, 2020. Breast Cancer Prediction Using K-mean Classification Algorithm with Self-adaptive Weight. Journal of Information Science and Technology, 10(2), 1-9. (in Thai)]

Jothi, R., Mohanty, S. K. and Ojha, A. 2017. DK-means: a deterministic K-means clustering algorithm for gene expression analysis. Pattern Analysis and Application, 22, 649-667.

Saadeh, H. Al Fayez, R. Q. and Elshqeirat, B. 2020. Application of K-Means Clustering to Identify Similar Gene Expression Patterns during Erythroid Development. International Journal of Machine Learning and Computing, 10(3), 452-457.

Joshi, R., Prasad, R., Mewada, P. and Saurabh. 2020. Modified LDA Approach For Cluster Based Gene Classification Using K-Mean Method. Procedia Computer Science, 171, 2493-2500.

Bhatt, V., Dhakar, M. and Chaurasia, B. K. 2016. Filtered Clustering Based on Local Outlier Factor in Data Mining. International Journal of Database Theory and Application, 9(5), 275-282.

Thakare, Y.S. and Bagal, S.B. 2015. Performance Evaluation of K-means Clustering Algorithm with Various Distance. International Journal of Computer Application, 110, 12-16.

Meng, Y., Liang, J., Cao, F. and He, Y. 2018. A New distance with derivative information for functional k-means clustering. Information Sciences, 463-464, 166-185.

Morissette, L. and Chartier. S. 2013. The k-means clustering technique: General considerations and implementation in Mathematica. Tutorials in Quantitative Methods for Psychology, 9(1), 15-24.