The classification of diabetic patients using machine learning method by feature selection

Main Article Content

Sayan Tepdang

Abstract

          The classification of types of diabetic patients is difficult because there are not only variant features, but many features needed to diagnose the symptom of diabetes. This research proposes to classify the types of patients, whether or not they are diabetic, using machine learning to find the factor for feature selection. The study was utilized data of 536 people from the website https://www.kaggle.com, that collected on 8 features causing diabetes as following; Pregnancies, glucose in blood, blood pressure, skin thickness, insulin in blood, body mass index, diabetes pedigree function, and age. By training and testing ratio of 90:10%, 80:20%, 70:30%, 60:40%, 50:50% and splitting a data set for 10-fold cross-validation, the result was showed that the optimizing method, Gradient Boosted Trees, has an efficiency at 87.14% and standard deviation at 0.80 with the best efficacy of feature selection by Filter-based factor selection method with Decision Tree of only 4 factors: glucose in blood, age, frequency of pregnancies and insulin in blood. According those of factors, the efficacy of diabetic classification would heal and cure diabetic with a speedy recovery and longer life.

Article Details

How to Cite
Tepdang, S. (2023). The classification of diabetic patients using machine learning method by feature selection. RMUTSB ACADEMIC JOURNAL, 11(1), 29–44. Retrieved from https://li01.tci-thaijo.org/index.php/rmutsb-sci/article/view/257786
Section
Research Article

References

ABB. (2023). Deep learning. Retrieved 14 January 2023, from https://new.abb.com/news/detail/58004/deep-learning (in Thai)

Bureau of Information Office of the Permanent Secretary. (2020). Diabetes mellitus. Retrieved 14 December 2022, from https://pr.moph.go.th/?url=pr/detail/2/02/181256/ (in Thai)

Butt, U. M., Letchmunan, S., Ali, M., Hassan, F. H., Baqir, A., & Sherazi, H. H. R. (2021). Machine learning based diabetes classification and prediction for healthcare applications. Journal of Healthcare Engineering, 2021, 9930985.

Dataset. (2023). Ddiabetes mellitus. Retrieved 14 December 2022, from https://www.kaggle.com/datasets/akshaydattatraykhare/diabetes-dataset

Dhurakij Pundit University. (2023). Gradient boosted trees. Retrieved 14 January 2023, from https://grad.dpu.ac.th/upload/content/ files/year9-3/9-30.pdf (in Thai)

Glurgeek. (2023). Support vector machine (SVM). Retrieved 14 January 2023, from https://www.glurgeek.com/education/support-vector-machine/ (in Thai)

Kasetsart University. (2018). Logistic regression. Retrieved 14 January 2023, from https://forest-admin.forest.ku.ac.th/304xxx/?q=system/files/book/5%282018%29%20Logistic%20Regression.pdf (in Thai)

Lai, H., Huang, H., Keshavjee, K., Guergachi, A., & Gao, X. (2019). Predictive models for diabetes mellitus using machine learning techniques. BMC Endocrine Disorders, 19, 101.

Medium. (2023). Naïve Bayes classification. Retrieved 14 January 2023, from https://peachapong-poolpol.medium.com/na%C3%AFve-bayes-classification-cb6cf905505d (in Thai)

Nagaraj, P., Deepalakshmi, P., Mansour, R. F., & Almazroa, A. (2021). Artificial flora algorithm-based feature selection with gradient boosted tree model for diabetes classification. Diabetes, Metabolic Syndrome and Obesity, 14, 2789-2806.

Nonsiri, N., Chaichitwanidchakol, P., & Somkantha, K. (2022). Data classification for diabetes risk diagnosis using majority voting ensemble method and forward feature selection method. Udon thani Rajabhat University Journal of Sciences and Technoogy, 10(2), 107-122.

Phuket Hospital. (2022). Diabetes mellitus. Retrieved 14 December 2022, from https://www.phukethospital.com/th/news-events/diabetes/ (in Thai)

Rawat, V., & Suryakant, S. (2019). A classification system for diabetic patients with machine learning techniques. International Journal of Mathematical, Engineering and Management Sciences, 4(3), 729-744.

Rubaiat, S. Y., Rahman, M. M., & Hasan, M. K. (2018). Important feature selection & accuracy comparisons of different machine learning models for early diabetes detection. International Conference on Innovation in Engineering and Technology (pp.1-6). Dhaka, Bangladesh: IEEE.

Saxena, R., Sharma, S. K., Gupta, M., & Sampada, G. C. (2022). A novel approach for feature selection and classification of diabetes mellitus: Machine learning methods. Computational Intelligence and Neuroscience, 2022(Special issue), 3820360.

Selvi, R. T., & Muthulakshmi, I. (2021). Modelling the map reduce based optimal gradient boosted tree classification algorithm for diabetes mellitus diagnosis system. Journal of Ambient Intelligence and Humanized Computing, 12, 1717-1730.

Sidong, W., Xuejiao, Z., & Chunyan, M. (2018). A comprehensive exploration to the machine learning techniques for diabetes identification. 2018 IEEE 4th World Forum on Internet of Things (WF-IoT) (pp. 291-295). Singapore: IEEE.

Tepdang, S., & Ponprasert, R. (2022). Forecasting and clustering of cassava price by machine learning (A study of Cassava prices in Thailand). Indonesian Journal of Electrical Engineering and Informatics, 10(4), 825-836.

Thai Programmer Association. (2023). Deep learning Retrieved 14 December 2022, from https://www.thaiprogrammer.org/2018/12/deep-learning

Thatoom Hospital. (2014). Diabetes mellitus. Retrieved 14 December 2022, from http://www.thatoomhsp.com/ (in Thai)

Th.LinkedIn. (2023). Feature selection. Retrieved 14 January 2023, from https://th.linkedin.com/pulse/ (in Thai)

Th.Wikipedia. (2021). Decision tree. Retrieved 14 December 2022, from https://th.wikipedia.org/wiki/