Building a Model and Evaluating the Performance for Colon Cancer Screening Using Histopathological Images

Main Article Content

Lersak Phothong
Charanya Phanprasat
Patiparn Thongyu
Anupong Sukprasert

Abstract

Colon cancer remains a major global health concern. Early detection is critical for improving treatment outcomes. Although conventional diagnostic methods are generally reliable, they can be time-consuming and heavily dependent on the expertise of medical professionals. This study aims to evaluate the effectiveness of machine learning models in classifying colon cancer using 10,000 histopathological images obtained from www.kaggle.com. The data were analyzed following standard data mining procedures using four image classification techniques: Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Neural Network (NN), and Decision Tree (DT). The results showed that the k-NN technique achieved the highest accuracy at 91.86%, along with the highest sensitivity and specificity values at 92.24% and 91.48%, respectively. These findings indicate that the k-NN technique is highly suitable for developing classification models for colon cancer, contributing to the creation of essential tools for early detection and supporting more effective treatment planning.

Article Details

How to Cite
Phothong, L., Phanprasat, C., Thongyu, P., & Sukprasert, A. (2025). Building a Model and Evaluating the Performance for Colon Cancer Screening Using Histopathological Images. Journal of Science Ladkrabang, 34(2), 117–135. retrieved from https://li01.tci-thaijo.org/index.php/science_kmitl/article/view/261425
Section
Research article

References

Almukhtar, F. H., Kareem, S. W., & Khoshaba, F. S. (2024). Design and development of an effective classifier for medical images based on machine learning and image segmentation. Egyptian Informatics Journal, 25, Article 100454. https://doi.org/10.1016/j.eij.2024.100454

Borkowski, A. A., Bui, M. M., Thomas, L. B., Wilson, C. P., DeLand, L. A., & Mastorides, S. M. (2019). Lung and colon cancer histopathological image dataset (LC25000). arXiv. https://arxiv.org/pdf/1912.12142

Chinpanthana, N. (2022). Learning model of human body movement using convolutional neural network and long short-term memory. Journal of Information Science and Technology, 12(1), 27-36. https://doi.org/10.14456/jist.2022.3

Gnanaselvi, J. A., & Kalavathy, G. M. (2021). Detecting disorders in retinal images using machine learning techniques. Journal of Ambient Intelligence and Humanized Computing, 12(5), 4593-4602. https://doi.org/10.1007/s12652-020-01841-2 (Retraction published 2022, Journal of Ambient Intelligence and Humanized Computing, 14, 551)

Górriz, J. M., Segovia, F., Ramírez, J., Ortiz, A., & Suckling, J. (2024). Is K-fold cross validation the best model selection method for machine learning?. arXiv. https://doi.org/10.48550/arXiv.2401.16407

Gupta, P., Chiang, S.-F., Sahoo, P. K., Mohapatra, S. K., You, J.-F., Onthoni, D. D., Hung, H.-Y., Chiang, J.-M., Huang, Y., & Tsai, W.-S. (2019). Prediction of colon cancer stages and survival period with machine learning approach. Cancers, 11(12), Article 2007. https://doi.org/10.3390/cancers11122007

Habib, N., & Rahman, M. M. (2021). Diagnosis of corona diseases from associated genes and X-ray images using machine learning algorithms and deep CNN. Informatics in Medicine Unlocked, 24, Article 100621. https://doi.org/10.1016/j.imu.2021.100621

Heisser, T., Hoffmeister, M., & Brenner, H. (2023). Colorectal cancer: A health and economic problem. Best Practice & Research Clinical Gastroenterology, 67, Article 101839. https://doi.org/10.1016/j.bpg.2023.101839

International Agency for Research on Cancer. (n.d.). Thailand fact sheets: Cancer today. International Agency for Research on Cancer. https://gco.iarc.fr/today/data/factsheets/populations/764-thailand-fact-sheets.pdf

Krishnan, S. N., Barua, S., Frankel, T. L., & Rao, A. (2022). Towards the characterization of the tumor microenvironment through dictionary learning-based interpretable classification of multiplexed immunofluorescence images. Physics in Medicine & Biology, 68(1), Article 014002. https://doi.org/10.1088/1361-6560/aca86a

Krstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014). Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of Cheminformatics, 6(1), Article 10. https://doi.org/10.1186/1758-2946-6-10

Lungu, A., Swift, A. J., Capener, D., Kiely, D., Hose, R., & Wild, J. M. (2016). Diagnosis of pulmonary hypertension from magnetic resonance imaging–based computational models and decision tree analysis. Pulmonary Circulation, 6(2), 181-190. https://doi.org/10.1086/686020

Markatou, M., Tian, H., Biswas, S., & Hripcsak, G. (2005). Analysis of variance of cross-validation estimators of the generalization error. Journal of Machine Learning Research, 6(39), 1127-1168. https://doi.org/10.7916/D86D5R2X

Purnama, A., Lukman, K., Rudiman, R., Prasetyo, D., Fuadah, Y., Nugraha, P., & Candrawinata, V. S. (2023). The prognostic value of COX-2 in predicting metastasis of patients with colorectal cancer: A systematic review and meta-analysis. Heliyon, 9(10), Article e21051. https://doi.org/10.1016/j.heliyon.2023.e21051

Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine learning. arXiv. https://arxiv.org/abs/1811.12808

Shi, C. R., & Adnan, R. (2014). Modified cross-validation as a method for estimating parameter. AIP Conference Proceedings, 1635(1), 724-731. https://doi.org/10.1063/1.4903662

Smits, L. J. H., Vink-Börger, E., van Lijnschoten, G., Focke-Snieders, I., van der Post, R. S., Tuynman, J. B., van Grieken, N. C. T., & Nagtegaal, I. D. (2022). Diagnostic variability in the histopathological assessment of advanced colorectal adenomas and early colorectal cancer in a screening population. Histopathology, 80(5), 790-798. https://doi.org/10.1111/his.14601

Sukprasert, A. (2023). Data mining with RapidMiner Studio software (5th ed.). Mahasarakham University. (in Thai)

Thitima. (2023, January 13). Public and private sectors collaborate to develop a health policy lab for value-based healthcare services for cancer patients in Thailand (Value-based Health Care Policy Lab). Health Systems Research Institute. https://wwwold.hsri.or.th/media/news/detail/14222 (in Thai)

Verma, J., Nath, M., Tripathi, P., & Saini, K. K. (2017). Analysis and identification of kidney stone using Kth nearest neighbour (KNN) and support vector machine (SVM) classification techniques. Pattern Recognition and Image Analysis, 27(3), 574-580. https://doi.org/10.1134/S1054661817030294

Vommi, A. M., & Battula, T. K. (2023). A hybrid filter-wrapper feature selection using Fuzzy KNN based on Bonferroni mean for medical datasets classification: A COVID-19 case study. Expert Systems with Applications, 218, Article 119612. https://doi.org/10.1016/j.eswa.2023.119612

Wilimitis, D., & Walsh, C. G. (2023). Practical considerations and applied examples of cross-validation for model development and evaluation in health care: Tutorial. JMIR AI, 2, Article e49023. https://doi.org/10.2196/49023

Wirth, R., & Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining. Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining (pp. 29-39).

World Cancer Research Fund International. (n.d.). Colorectal cancer statistics. World Cancer Research Fund International. https://www.wcrf.org/preventing-cancer/cancer-statistics/colorectal-cancer-statistics/

World Health Organization. (2023, July 11). Colorectal cancer. World Health Organization. https://www.who.int/news-room/fact-sheets/detail/colorectal-cancer

Xu, Y., Jiao, L., Wang, S., Wei, J., Fan, Y., Lai, M., & Chang, E. I. (2013). Multi-label classification for colon cancer using histopathological images. Microscopy Research and Technique, 76(12), 1266-1277. https://doi.org/10.1002/jemt.22294

Zerouaoui, H., & Idri, A. (2021). Reviewing machine learning and image processing based decision-making systems for breast cancer imaging. Journal of Medical Systems, 45(1), Article 8. https://doi.org/10.1007/s10916-020-01689-1