Data clustering analysis to identify high-risk Thai individuals for domestic violence and mental health problems using machine learning
Main Article Content
Abstract
Domestic violence and mental health problems remain critical public health concerns in Thailand, with an increasing prevalence observed during economic crises and the COVID-19 pandemic. Despite their significant societal impact, previous research has lacked in-depth analytical approaches utilizing data science to identify high-risk populations. This study aims to classify high-risk Thai individuals vulnerable to mental health issues and domestic violence by applying unsupervised machine learning techniques, specifically K-means, Hierarchical Clustering (HC), and Gaussian Mixture Models (GMMs). Internal evaluation metrics, including the Silhouette Score, Calinski-Harabasz Index, and Davies-Bouldin Index, were used to assess clustering performance. The dataset comprised 1,162 records of domestic violence offenders obtained from the Digital Government Development Agency (DGA) in Thailand. The findings indicate that HC achieved the highest performance (Silhouette Score = 0.429, Calinski-Harabasz Index = 61.790, and Davies-Bouldin Index = 1.034), effectively differentiating risk groups. The high-risk group was predominantly characterized by middle-aged males with mental health issues, substance abuse, and economic stress. This study demonstrates the potential of machine learning for identifying vulnerable populations. It provides insights that can inform the development of targeted prevention strategies, early warning systems, and evidence-based policymaking to mitigate domestic violence and promote sustainable mental well-being.
Article Details
References
ณัฐวุฒิ แถมเงิน, ปกรณ์ ล่องทอง, พงศศรัณย์ ทองหนูนุ้ย, กนกวรรณ ละอองศรี, อนามัย เทศกะทึก, พีรพล ศิริพงศ์วุฒิกร, และณฐนนท์ เทพตะขบ, และวิริยะ มหิกุล. (2567). การเรียนรู้ของเครื่องเพื่อทำนายระดับความรุนแรงของความผิดปกติของความยืดหยุ่นปอดของพนักงานโรงงาน. วารสารวิทยาศาสตร์และเทคโนโลยี มหาวิทยาลัยมหาสารคาม, 43(2), 84–95.
พิทยา สังข์แก้ว, รุ่งทิวา เสาวนีย์, และอารยา หาอุปละ. (2565). การสำรวจความรุนแรงในครอบครัวในสถานการณ์การแพร่ระบาดของโรคโควิด-19 กรุงเทพมหานคร. Rama Medical Journal, 45(3), 33–41. https://doi.org/10.33165/rmj.2022.45.3.257258
สำนักงานพัฒนารัฐบาลดิจิทัล. (2567). ข้อมูลผู้กระทำความรุนแรงในครอบครัว [ชุดข้อมูล]. ศูนย์กลางข้อมูลเปิดภาครัฐ. https://gdcatalog.go.th/dataset/gdpublish-dwf-pb-dmv01-050507-04 (เข้าถึงเมื่อ 3 มีนาคม 2568)
อรรคพล ดำเนินผล. (2564). บทบาทศูนย์พัฒนาครอบครัวในชุมชนกับปัญหาความรุนแรงในครอบครัว: กรณีศึกษาศูนย์พัฒนาครอบครัวในชุมชน ตำบลท่าทราย จังหวัดนนทบุรี (สารนิพนธ์ปริญญารัฐประศาสนศาสตรมหาบัณฑิต). จุฬาลงกรณ์มหาวิทยาลัย, Chula Digital Collections. https://digital.car.chula.ac.th/chulaetd/8052
Amer, A. A., Al-Razgan, M., Abdalla, H. I., Al-Asaly, M., Alfakih, T., & Al-Hammadi, M. (2024). Neighboring-aware hierarchical clustering: A new algorithm and extensive evaluation. International Journal on Semantic Web and Information Systems, 20(1). https://doi.org/10.4018/IJSWIS.346377
Analytics Vidhya. (2021, January 12). In-depth intuition of K-means clustering algorithm in machine learning. https://www.analyticsvidhya.com/blog/2021/01/in-depth-intuition-of-k-means-clustering-algorithm-in-machine-learning/
AntixK. (2024, January 12). Extending Mahalanobis distance to Gaussian mixtures. https://antixk.netlify.app/blog/gmm_mahalanobis/
Arora, N., Singh, A., Al-Dabagh, M. Z. N., & Maitra, S. K. (2022). A novel architecture for diabetes patients’ prediction using K-means clustering and SVM. Mathematical Problems in Engineering, 2022, 4815521. https://doi.org/10.1155/2022/4815521
Budiarto, A., Mahesworo, B., Hidayat, A. A., Nurlaila, I., & Pardamean, B. (2021). Gaussian mixture model (GMMs) implementation for population stratification estimation from genomics data. Procedia Computer Science, 179, 202–210. https://doi.org/10.1016/j.procs.2020.12.026
BuiltIn. (2024). The elbow method for clustering explained. https://builtin.com/data-science/elbow-method
GeeksforGeeks. (2020, May 8). Elbow method for optimal value of k in KMeans. https://www.geeksforgeeks.org/elbow-method-for-optimal-value-of-k-in-kmeans/
Georgiou, K. (2024). Thematic analysis: A practical guide. European Journal of Psychotherapy & Counselling. https://doi.org/10.1080/13642537.2024.2391666
Hasan, M. K., Alam, M. A., Das, D., Hossain, E., & Hasan, M. (2020). Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access, 8, 76516–76531. https://doi.org/10.1109/ACCESS.2020.2989857
Hassan, B. A., Rashid, T. A., & Hamarashid, H. K. (2021). A novel cluster detection of COVID-19 patients and medical disease conditions using improved evolutionary clustering algorithm star. Computers in Biology and Medicine, 138, Article 104866. https://doi.org/10.1016/j.compbiomed.2021.104866
Hemad, B. A., Ibrahim, N. M. A., Fayad, S. A., & Talaat, H. E. A. (2022). Hierarchical clustering-based framework for interconnected power system contingency analysis. Energies, 15(15), Article 5631. https://doi.org/10.3390/en15155631
Huang, H., Liao, Z., Wei, X., & Zhou, Y. (2023). Combined Gaussian mixture model and pathfinder algorithm for data clustering. Entropy, 25(6), Article 946. https://doi.org/10.3390/e25060946
Jewkes, R., Flood, M., & Lang, J. (2015). From work with men and boys to changes of social norms and reduction of inequities in gender relations: A conceptual shift in prevention of violence against women and girls. The Lancet, 385(9977), 1580–1589. https://doi.org/10.1016/S0140-6736(14)61683-4
Lima, S. P., & Cruz, M. D. (2020). A genetic algorithm using Calinski–Harabasz index for automatic clustering problem. Revista Brasileira de Computação Aplicada, 12(3), 97–106. https://doi.org/10.5335/rbca.v12i3.11117
Liu, Y., Li, Z., Xiong, H., Gao, X., & Wu, J. (2021). Understanding of internal clustering validation measures. Information Sciences, 575, 346–364. https://doi.org/10.1016/j.ins.2021.07.021
Müllner, D. (2020). Modern hierarchical, agglomerative clustering algorithms. arXiv. https://arxiv.org/pdf/2005.03197
Murtagh, F., & Legendre, P. (2014). Ward’s hierarchical agglomerative clustering method: Which algorithms implement Ward’s criterion? Journal of Classification, 31(2), 274–295. https://doi.org/10.1007/s00357-014-9161-z
Rizman Žalik, K., & Žalik, M. (2023). Comparison of K-means, K-means++, X-means and single value decomposition for image compression. In 27th International Conference on Circuits, Systems, Communications and Computers (CSCC) (pp. 295–301). IEEE. https://doi.org/10.1109/CSCC58962.2023.00055
Scikit-learn developers. (2023). sklearn.metrics.silhouette_score. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html
Supakorn, S. (2021, December 14). K-means clustering: การแบ่งกลุ่มย่อยเพื่อหาศูนย์กลาง! Medium. https://medium.com/@si.supakorn_st/k-means-clustering-67a018f8cfb6
Wang, F., Franco-Penya, H.-H., Kelleher, J. D., Pugh, J., & Ross, R. (2017). An analysis of the application of simplified silhouette to the evaluation of k-means clustering validity. In P. Perner (Ed.), Machine learning and data mining in pattern recognition. MLDM 2017. Lecture notes in computer science (Vol. 10358, pp. 291–305). Springer. https://doi.org/10.1007/978-3-319-62416-7_21
World Health Organization. (2024, March 25). Violence against women. https://www.who.int/news-room/fact-sheets/detail/violence-against-women
Zhang, T., Ju, L., Singh, P., & Toor, S. (2025). InfoHier: Hierarchical information extraction via encoding and embedding. arXiv. https://arxiv.org/abs/2501.08717