Data clustering analysis to identify high-risk Thai individuals for domestic violence and mental health problems using machine learning

Main Article Content

Wutthiphong Khuandin
Chanida Kaewphet

Abstract

Domestic violence and mental health problems remain critical public health concerns in Thailand, with an increasing prevalence observed during economic crises and the COVID-19 pandemic. Despite their significant societal impact, previous research has lacked in-depth analytical approaches utilizing data science to identify high-risk populations. This study aims to classify high-risk Thai individuals vulnerable to mental health issues and domestic violence by applying unsupervised machine learning techniques, specifically K-means, Hierarchical Clustering (HC), and Gaussian Mixture Models (GMMs). Internal evaluation metrics, including the Silhouette Score, Calinski-Harabasz Index, and Davies-Bouldin Index, were used to assess clustering performance. The dataset comprised 1,162 records of domestic violence offenders obtained from the Digital Government Development Agency (DGA) in Thailand. The findings indicate that HC achieved the highest performance (Silhouette Score = 0.429, Calinski-Harabasz Index = 61.790, and Davies-Bouldin Index = 1.034), effectively differentiating risk groups. The high-risk group was predominantly characterized by middle-aged males with mental health issues, substance abuse, and economic stress. This study demonstrates the potential of machine learning for identifying vulnerable populations. It provides insights that can inform the development of targeted prevention strategies, early warning systems, and evidence-based policymaking to mitigate domestic violence and promote sustainable mental well-being.

Article Details

Section
Original Articles

References

ณัฐวุฒิ แถมเงิน, ปกรณ์ ล่องทอง, พงศศรัณย์ ทองหนูนุ้ย, กนกวรรณ ละอองศรี, อนามัย เทศกะทึก, พีรพล ศิริพงศ์วุฒิกร, และณฐนนท์ เทพตะขบ, และวิริยะ มหิกุล. (2567). การเรียนรู้ของเครื่องเพื่อทำนายระดับความรุนแรงของความผิดปกติของความยืดหยุ่นปอดของพนักงานโรงงาน. วารสารวิทยาศาสตร์และเทคโนโลยี มหาวิทยาลัยมหาสารคาม, 43(2), 84–95.

พิทยา สังข์แก้ว, รุ่งทิวา เสาวนีย์, และอารยา หาอุปละ. (2565). การสำรวจความรุนแรงในครอบครัวในสถานการณ์การแพร่ระบาดของโรคโควิด-19 กรุงเทพมหานคร. Rama Medical Journal, 45(3), 33–41. https://doi.org/10.33165/rmj.2022.45.3.257258

สำนักงานพัฒนารัฐบาลดิจิทัล. (2567). ข้อมูลผู้กระทำความรุนแรงในครอบครัว [ชุดข้อมูล]. ศูนย์กลางข้อมูลเปิดภาครัฐ. https://gdcatalog.go.th/dataset/gdpublish-dwf-pb-dmv01-050507-04 (เข้าถึงเมื่อ 3 มีนาคม 2568)

อรรคพล ดำเนินผล. (2564). บทบาทศูนย์พัฒนาครอบครัวในชุมชนกับปัญหาความรุนแรงในครอบครัว: กรณีศึกษาศูนย์พัฒนาครอบครัวในชุมชน ตำบลท่าทราย จังหวัดนนทบุรี (สารนิพนธ์ปริญญารัฐประศาสนศาสตรมหาบัณฑิต). จุฬาลงกรณ์มหาวิทยาลัย, Chula Digital Collections. https://digital.car.chula.ac.th/chulaetd/8052

Amer, A. A., Al-Razgan, M., Abdalla, H. I., Al-Asaly, M., Alfakih, T., & Al-Hammadi, M. (2024). Neighboring-aware hierarchical clustering: A new algorithm and extensive evaluation. International Journal on Semantic Web and Information Systems, 20(1). https://doi.org/10.4018/IJSWIS.346377

Analytics Vidhya. (2021, January 12). In-depth intuition of K-means clustering algorithm in machine learning. https://www.analyticsvidhya.com/blog/2021/01/in-depth-intuition-of-k-means-clustering-algorithm-in-machine-learning/

AntixK. (2024, January 12). Extending Mahalanobis distance to Gaussian mixtures. https://antixk.netlify.app/blog/gmm_mahalanobis/

Arora, N., Singh, A., Al-Dabagh, M. Z. N., & Maitra, S. K. (2022). A novel architecture for diabetes patients’ prediction using K-means clustering and SVM. Mathematical Problems in Engineering, 2022, 4815521. https://doi.org/10.1155/2022/4815521

Budiarto, A., Mahesworo, B., Hidayat, A. A., Nurlaila, I., & Pardamean, B. (2021). Gaussian mixture model (GMMs) implementation for population stratification estimation from genomics data. Procedia Computer Science, 179, 202–210. https://doi.org/10.1016/j.procs.2020.12.026

BuiltIn. (2024). The elbow method for clustering explained. https://builtin.com/data-science/elbow-method

GeeksforGeeks. (2020, May 8). Elbow method for optimal value of k in KMeans. https://www.geeksforgeeks.org/elbow-method-for-optimal-value-of-k-in-kmeans/

Georgiou, K. (2024). Thematic analysis: A practical guide. European Journal of Psychotherapy & Counselling. https://doi.org/10.1080/13642537.2024.2391666

Hasan, M. K., Alam, M. A., Das, D., Hossain, E., & Hasan, M. (2020). Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access, 8, 76516–76531. https://doi.org/10.1109/ACCESS.2020.2989857

Hassan, B. A., Rashid, T. A., & Hamarashid, H. K. (2021). A novel cluster detection of COVID-19 patients and medical disease conditions using improved evolutionary clustering algorithm star. Computers in Biology and Medicine, 138, Article 104866. https://doi.org/10.1016/j.compbiomed.2021.104866

Hemad, B. A., Ibrahim, N. M. A., Fayad, S. A., & Talaat, H. E. A. (2022). Hierarchical clustering-based framework for interconnected power system contingency analysis. Energies, 15(15), Article 5631. https://doi.org/10.3390/en15155631

Huang, H., Liao, Z., Wei, X., & Zhou, Y. (2023). Combined Gaussian mixture model and pathfinder algorithm for data clustering. Entropy, 25(6), Article 946. https://doi.org/10.3390/e25060946

Jewkes, R., Flood, M., & Lang, J. (2015). From work with men and boys to changes of social norms and reduction of inequities in gender relations: A conceptual shift in prevention of violence against women and girls. The Lancet, 385(9977), 1580–1589. https://doi.org/10.1016/S0140-6736(14)61683-4

Lima, S. P., & Cruz, M. D. (2020). A genetic algorithm using Calinski–Harabasz index for automatic clustering problem. Revista Brasileira de Computação Aplicada, 12(3), 97–106. https://doi.org/10.5335/rbca.v12i3.11117

Liu, Y., Li, Z., Xiong, H., Gao, X., & Wu, J. (2021). Understanding of internal clustering validation measures. Information Sciences, 575, 346–364. https://doi.org/10.1016/j.ins.2021.07.021

Müllner, D. (2020). Modern hierarchical, agglomerative clustering algorithms. arXiv. https://arxiv.org/pdf/2005.03197

Murtagh, F., & Legendre, P. (2014). Ward’s hierarchical agglomerative clustering method: Which algorithms implement Ward’s criterion? Journal of Classification, 31(2), 274–295. https://doi.org/10.1007/s00357-014-9161-z

Rizman Žalik, K., & Žalik, M. (2023). Comparison of K-means, K-means++, X-means and single value decomposition for image compression. In 27th International Conference on Circuits, Systems, Communications and Computers (CSCC) (pp. 295–301). IEEE. https://doi.org/10.1109/CSCC58962.2023.00055

Scikit-learn developers. (2023). sklearn.metrics.silhouette_score. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html

Supakorn, S. (2021, December 14). K-means clustering: การแบ่งกลุ่มย่อยเพื่อหาศูนย์กลาง! Medium. https://medium.com/@si.supakorn_st/k-means-clustering-67a018f8cfb6

Wang, F., Franco-Penya, H.-H., Kelleher, J. D., Pugh, J., & Ross, R. (2017). An analysis of the application of simplified silhouette to the evaluation of k-means clustering validity. In P. Perner (Ed.), Machine learning and data mining in pattern recognition. MLDM 2017. Lecture notes in computer science (Vol. 10358, pp. 291–305). Springer. https://doi.org/10.1007/978-3-319-62416-7_21

World Health Organization. (2024, March 25). Violence against women. https://www.who.int/news-room/fact-sheets/detail/violence-against-women

Zhang, T., Ju, L., Singh, P., & Toor, S. (2025). InfoHier: Hierarchical information extraction via encoding and embedding. arXiv. https://arxiv.org/abs/2501.08717