Building a Model and Evaluating the Performance for Colon Cancer Screening Using Histopathological Images
Main Article Content
Abstract
Colon cancer remains a major global health concern. Early detection is critical for improving treatment outcomes. Although conventional diagnostic methods are generally reliable, they can be time-consuming and heavily dependent on the expertise of medical professionals. This study aims to evaluate the effectiveness of machine learning models in classifying colon cancer using 10,000 histopathological images obtained from www.kaggle.com. The data were analyzed following standard data mining procedures using four image classification techniques: Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Neural Network (NN), and Decision Tree (DT). The results showed that the k-NN technique achieved the highest accuracy at 91.86%, along with the highest sensitivity and specificity values at 92.24% and 91.48%, respectively. These findings indicate that the k-NN technique is highly suitable for developing classification models for colon cancer, contributing to the creation of essential tools for early detection and supporting more effective treatment planning.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
Almukhtar, F. H., Kareem, S. W., & Khoshaba, F. S. (2024). Design and development of an effective classifier for medical images based on machine learning and image segmentation. Egyptian Informatics Journal, 25, Article 100454. https://doi.org/10.1016/j.eij.2024.100454
Borkowski, A. A., Bui, M. M., Thomas, L. B., Wilson, C. P., DeLand, L. A., & Mastorides, S. M. (2019). Lung and colon cancer histopathological image dataset (LC25000). arXiv. https://arxiv.org/pdf/1912.12142
Chinpanthana, N. (2022). Learning model of human body movement using convolutional neural network and long short-term memory. Journal of Information Science and Technology, 12(1), 27-36. https://doi.org/10.14456/jist.2022.3
Gnanaselvi, J. A., & Kalavathy, G. M. (2021). Detecting disorders in retinal images using machine learning techniques. Journal of Ambient Intelligence and Humanized Computing, 12(5), 4593-4602. https://doi.org/10.1007/s12652-020-01841-2 (Retraction published 2022, Journal of Ambient Intelligence and Humanized Computing, 14, 551)
Górriz, J. M., Segovia, F., Ramírez, J., Ortiz, A., & Suckling, J. (2024). Is K-fold cross validation the best model selection method for machine learning?. arXiv. https://doi.org/10.48550/arXiv.2401.16407
Gupta, P., Chiang, S.-F., Sahoo, P. K., Mohapatra, S. K., You, J.-F., Onthoni, D. D., Hung, H.-Y., Chiang, J.-M., Huang, Y., & Tsai, W.-S. (2019). Prediction of colon cancer stages and survival period with machine learning approach. Cancers, 11(12), Article 2007. https://doi.org/10.3390/cancers11122007
Habib, N., & Rahman, M. M. (2021). Diagnosis of corona diseases from associated genes and X-ray images using machine learning algorithms and deep CNN. Informatics in Medicine Unlocked, 24, Article 100621. https://doi.org/10.1016/j.imu.2021.100621
Heisser, T., Hoffmeister, M., & Brenner, H. (2023). Colorectal cancer: A health and economic problem. Best Practice & Research Clinical Gastroenterology, 67, Article 101839. https://doi.org/10.1016/j.bpg.2023.101839
International Agency for Research on Cancer. (n.d.). Thailand fact sheets: Cancer today. International Agency for Research on Cancer. https://gco.iarc.fr/today/data/factsheets/populations/764-thailand-fact-sheets.pdf
Krishnan, S. N., Barua, S., Frankel, T. L., & Rao, A. (2022). Towards the characterization of the tumor microenvironment through dictionary learning-based interpretable classification of multiplexed immunofluorescence images. Physics in Medicine & Biology, 68(1), Article 014002. https://doi.org/10.1088/1361-6560/aca86a
Krstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014). Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of Cheminformatics, 6(1), Article 10. https://doi.org/10.1186/1758-2946-6-10
Lungu, A., Swift, A. J., Capener, D., Kiely, D., Hose, R., & Wild, J. M. (2016). Diagnosis of pulmonary hypertension from magnetic resonance imaging–based computational models and decision tree analysis. Pulmonary Circulation, 6(2), 181-190. https://doi.org/10.1086/686020
Markatou, M., Tian, H., Biswas, S., & Hripcsak, G. (2005). Analysis of variance of cross-validation estimators of the generalization error. Journal of Machine Learning Research, 6(39), 1127-1168. https://doi.org/10.7916/D86D5R2X
Purnama, A., Lukman, K., Rudiman, R., Prasetyo, D., Fuadah, Y., Nugraha, P., & Candrawinata, V. S. (2023). The prognostic value of COX-2 in predicting metastasis of patients with colorectal cancer: A systematic review and meta-analysis. Heliyon, 9(10), Article e21051. https://doi.org/10.1016/j.heliyon.2023.e21051
Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine learning. arXiv. https://arxiv.org/abs/1811.12808
Shi, C. R., & Adnan, R. (2014). Modified cross-validation as a method for estimating parameter. AIP Conference Proceedings, 1635(1), 724-731. https://doi.org/10.1063/1.4903662
Smits, L. J. H., Vink-Börger, E., van Lijnschoten, G., Focke-Snieders, I., van der Post, R. S., Tuynman, J. B., van Grieken, N. C. T., & Nagtegaal, I. D. (2022). Diagnostic variability in the histopathological assessment of advanced colorectal adenomas and early colorectal cancer in a screening population. Histopathology, 80(5), 790-798. https://doi.org/10.1111/his.14601
Sukprasert, A. (2023). Data mining with RapidMiner Studio software (5th ed.). Mahasarakham University. (in Thai)
Thitima. (2023, January 13). Public and private sectors collaborate to develop a health policy lab for value-based healthcare services for cancer patients in Thailand (Value-based Health Care Policy Lab). Health Systems Research Institute. https://wwwold.hsri.or.th/media/news/detail/14222 (in Thai)
Verma, J., Nath, M., Tripathi, P., & Saini, K. K. (2017). Analysis and identification of kidney stone using Kth nearest neighbour (KNN) and support vector machine (SVM) classification techniques. Pattern Recognition and Image Analysis, 27(3), 574-580. https://doi.org/10.1134/S1054661817030294
Vommi, A. M., & Battula, T. K. (2023). A hybrid filter-wrapper feature selection using Fuzzy KNN based on Bonferroni mean for medical datasets classification: A COVID-19 case study. Expert Systems with Applications, 218, Article 119612. https://doi.org/10.1016/j.eswa.2023.119612
Wilimitis, D., & Walsh, C. G. (2023). Practical considerations and applied examples of cross-validation for model development and evaluation in health care: Tutorial. JMIR AI, 2, Article e49023. https://doi.org/10.2196/49023
Wirth, R., & Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining. Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining (pp. 29-39).
World Cancer Research Fund International. (n.d.). Colorectal cancer statistics. World Cancer Research Fund International. https://www.wcrf.org/preventing-cancer/cancer-statistics/colorectal-cancer-statistics/
World Health Organization. (2023, July 11). Colorectal cancer. World Health Organization. https://www.who.int/news-room/fact-sheets/detail/colorectal-cancer
Xu, Y., Jiao, L., Wang, S., Wei, J., Fan, Y., Lai, M., & Chang, E. I. (2013). Multi-label classification for colon cancer using histopathological images. Microscopy Research and Technique, 76(12), 1266-1277. https://doi.org/10.1002/jemt.22294
Zerouaoui, H., & Idri, A. (2021). Reviewing machine learning and image processing based decision-making systems for breast cancer imaging. Journal of Medical Systems, 45(1), Article 8. https://doi.org/10.1007/s10916-020-01689-1