Assessing the Performance of Data Mining Techniques for Breast Cancer Patient Screening

Main Article Content

Anupong Sukprasert
Sirinapa Phomsopa
Yossapat Srimo

Abstract

This research aimed to evaluate the performance of various data mining techniques in constructing predictive models for breast cancer screening. Seven classification methods were compared, namely Neural networks, Support Vector Machine (SVM), Naïve Bayes, k-Nearest Neighbors (k-NN), Decision tree, Deep learning, and Ensemble vote. The dataset used in this study comprised 569 patient records obtained from the University of Wisconsin and made publicly available on www.kaggle.com. The analysis was conducted following the CRISP-DM process, which included variable selection, handling of missing data, and defining the roles of each attribute. The results revealed that the Neural Network technique yielded the best performance, achieving an accuracy of 98.07%, sensitivity of 99.15%, specificity of 96.21%, and an overall efficiency of 98.47%. These findings demonstrate the potential of this technique to significantly support the early detection and diagnosis of breast cancer.

Article Details

How to Cite
Sukprasert, A., Phomsopa, S., & Srimo, Y. (2025). Assessing the Performance of Data Mining Techniques for Breast Cancer Patient Screening. Journal of Science Ladkrabang, 34(2), 41–58. retrieved from https://li01.tci-thaijo.org/index.php/science_kmitl/article/view/267178
Section
Research article

References

Hfocus. (2024, November 21). MOPH emphasizes breast cancer threat: 49 Thai women diagnosed daily, 13 deaths per day. Hfocus. https://www.hfocus.org/content/2024/11/32298 (in Thai)

Kumjit, K., Jaikoomkao, D., Phumirang, W., Sattanako, A., & Sukprasert, A. (2022). The efficiency of data mining technique for the prognosis of cerebrovascular disease. Journal of Applied Informatics and Technology, 4(2), 87-98. https://doi.org/10.14456/jait.2022.7 (in Thai)

Muhammad, N. S. (2024). Breast cancer dataset. Kaggle. https://www.kaggle.com/datasets/nairasaeedmuhammad/breast-cancer

Papageorgiou, S. N. (2022). On correlation coefficients and their interpretation. Journal of Orthodontics, 49(3), 359–361. https://doi.org/10.1177/14653125221076142

Phikulsri, A., & Chanamarn, N. (2023). Efficiency comparison of classification methods for kidney disease with data mining techniques. Journal of Science Engineering and Technology, 3(1), 1-17. https://ph02.tci-thaijo.org/index.php/JSET/article/view/247493 (in Thai)

Prema, K. M., & Jagadeesh, P. (2023). Detection of breast cancer using artificial neural network classifier and comparing with support vector machine classifier. Proceedings of the 4th International Conference on Material Science and Applications (pp. 020105). AIP Publishing LLC. https://doi.org/10.1063/5.0173034

Rideach, N., Khaoead, A., & Srisomboon, P. (2022). A behavioral analysis model and the cause of alcohol dependence with the decision tree technique. Journal of Kasetsart Educational Review, 37(3), 202-211. https://so04.tci-thaijo.org/index.php/eduku/article/view/251166 (in Thai)

Ruangsawud, A., Sukprasert, A., Sinthukoot, T., & Kaiwinit, S. (2023). Comparison of predictive models for the prognosis of lung cancer. Kalasin University Journal of Science Technology and Innovation, 2(2), 39-52. https://doi.org/10.14456/ksti.2023.8 (in Thai)

Siphating, K., Peranam, N., Sawangloke, W., & Sukprasert, A. (2023). Classification of MRI images for brain tumor patient screening. Journal of Applied Informatics and Technology, 5(2), 100-115. https://doi.org/10.14456/jait.2023.8 (in Thai)

Srisuk, U., & Thongkam, J. (2021). The efficiency comparison of data mining techniques for patient incidence. Journal of Science and Technology Mahasarakham University, 40(2), 157-163. https://li01.tci-thaijo.org/index.php/scimsujournal/article/view/247870 (in Thai)

Sukprasert, A. (2023). Data Mining with RapidMiner Studio (5th ed.). Department of Business Computer, Mahasarakham Business School, Mahasarakham University, Mahasarakham. (in Thai)

Taiwiriyawet, W. (2023). The number 1 cancer among women worldwide: Understand breast cancer before it spreads. Thammasat University. https://tu.ac.th/thammasat-090566-breast-cancer-no1-cancer-among-women-worldwide (in Thai)

Tongkunwong, S., & Sawatkamon, P. (2024). Comparing the performance of machine learning models for classifying lung cancer patients. UTK Research Journal, 18(1), 33-42. https://ph02.tci-thaijo.org/index.php/rmutk/article/view/252954 (in Thai)

Wacharaphapaiboon, W. (2021). What causes breast cancer? Praram 9 Hospital. https://www.praram9.com/breast-cancer-staging/ (in Thai)

Wirth, R., & Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining. Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining (pp. 29-40), Manchester, UK. http://www.cs.unibo.it/~danilo.montesi/CBD/Beatriz/10.1.1.198.5133.pdf

World Health Organization. (2024). Breast cancer. https://www.who.int/news-room/fact-sheets/detail/breast-cancer