Weighted Voting Ensemble for Depressive Disorder Analysis with Multi-objective Optimization

Main Article Content

Wongpanya Nuankaew
Pratya Nuankaew
Damrongdet Doenribram
Chatklaw Jareanpon*

Abstract

The Twitter platform is a popular tool that is widely used by researchers to collect data on users’ personal lives, feelings and emotions. These data sets can be further analyzed using text mining techniques to predict the disorder of depression. There are nine symptoms of depression that are classified by American Psychiatric Association using DSM-5 criteria. The symptoms can be difficult to identify effectively. The unweighted vote ensemble is not practical for multi-class data. Therefore, this research proposes the multi-objective optimization algorithms for depressive symptom prediction modeling (MOADSP) for the weighted voting ensemble, which can improve its effectiveness compared to the singer model. The objectives of this research were 1) to find the appropriate number of features; 2) to improve the weights of the prediction models based on the recall of the class for the ensemble; and 3) to compare the performance of the single, unweighted, and weighted voting ensemble models for depressive disorder. An information gain was used to select the features. The single classification techniques used in the experiment that had their frameworks tested were the Naïve Bayes, Random Forest, and K-Nearest techniques, while the vote ensemble models used were the unweighted and weighted models. MOADSP was applied to the weighted vote ensemble models. The results showed that the best recall classifier was KNN (98.60%), and the highest recall classifier was AVG TP weighted (98.43%) for the training model. The highest recall in the class depressive classifier was AVG TP weighted (80.00%) for the testing. This proposed method was beneficial for the prediction of depressive disorder.


Keywords: weighted voting ensemble; multi-objective optimization; major depressive disorder;  text classification; depressive disorder analysis


*Corresponding author: Tel.: (+66) 985951653


                                             E-mail: chatklaw.j@msu.ac.th

Article Details

Section
Original Research Articles

References

Negrão, A.B. and Gold, P.W., 2007. Major depressive disorder. Encyclopedia of Stress, 28, 640-645.

Mousavian, M., Chen, J. and Greening, S., 2018. Feature selection and imbalanced data handling for depression detection. International Conference on Brain Informatics, Arlington, USA., December 7-9, 2018, pp. 349-358.

World Health Organization, 2021. Depression. [online] Available at: https://www.who.int/news-room/fact-sheets/detail/depression.

World Population Review, 2021. Depression Rates by Country 2021. [online] Available at: https://worldpopulationreview.com/country-rankings/depression-rates-by-country.

UNICEF, 2021. SOS Helplines for Parents and Children – Essential Services for Preventing Suicides. [online] Available at: https://www.unicef.org/montenegro/en/stories/sos-helplines-parents-and-children-essential-services-preventing-suicides.

Lin, L.Y., Sidani, J.E., Shensa, A., Radovic, A., Miller, E., Colditz, J.B., Hoffman, B.L., Giles, L.M. and Primck, B.A., 2016. Association between social media use and depression among U.S. young adults. Depression and Anxiety, 33(4), 323-331.

Simon, K., 2020. Digital 2020: July Global Statshot. [online] Available at: https://data reportal.com/reports/digital-2020-july-global-statshot.

Hand, D., Mannila, H. and Smyth, P., 2001. Principles of Data Mining. Cambridge: MIT Press.

Jimenez-Marquez, J.L., Gonzalez-Carrasco, I., Lopez-Cuadrado, J.L., and Ruiz-Mezcua, B., 2019. Towards a big data framework for analyzing social media content. International Journal of Information Management, 44(C), 1-12.

Harrigian, K., Aguirre, C. and Dredze, M., 2020. On the state of social media data for mental health research. Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology, Mexico City, Mexico, June 11, 2020, pp. 15-24.

Coppersmith, G., Dredze, M., and Harman, C., 2014. Quantifying mental health signals in Twitter. Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Baltimore, USA, June 27, 2014, pp. 51-60.

Mowery, D., Park, A., Conway, M. and Bryan, C., 2016. Towards automatically classifying depressive symptoms from Twitter data for population health. Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES), Osaka, Japan, December 12, 2016, pp. 182-191.

Doenribram, D., Jareanpon, C., Jiranukool M.D. and Jariya, S., 2019. Major depressive disorder classification from user behaviors from twitter. The Twenty-Fourth International Symposium on Artificial Life and Robotics 2019 (AROB), Beppu, Japan, January 23-25, 2019, pp. 241-246.

Shen, G., Jia, J., Nie, L., Feng, F., Zhang, C., Hu, T., Chua, T.-S. and Zhu, W., 2017. Depression detection via harvesting social media: a multimodal dictionary learning solution. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, August 2017, pp. 3838-3844.

Wolohan, J.T., Hiraga, M., Mukherjee, A. and Sayyed, Z.A., 2018. Detecting linguistic traces of depression in topic-restricted text: attending to self-stigmatized depression with NLP. Proceedings of the First International Workshop on Language Cognition and Computational Models, New Mexico, USA., August 20, 2018, pp. 11-21.

Onan, A., Korukoğlu, S. and Bulut, H., 2017. A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification. Information Processing and Management, 53(4), 814-833.

Zul, M.I., Yulia, F. and Nurmalasari, D., 2018. Social media sentiment analysis using K-means and naïve Bayes algorithm. Proceedings of 2nd International Conference on Electrical Engineering and Informatics: Toward the Most Efficient Way of Making and Dealing with Future Electrical Power System and Big Data Analysis (ICon EEI ), Batam, Indonesia, October 16-17, 2018, pp. 24-29.

Shatte, A.B.R., Hutchinson, D.M. and Teague, S.J., 2019. Machine learning in mental health: a scoping review of methods and applications. Psychological Medicine, 49(9), 1426-1448.

Ahmed, Z., Mohamed, K., Zeeshan, S. and Dong, X. Q., 2020. Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine. Database : The Journal of Biological Databases and Curation, 2020, DOI: 10.1093/database/baaa010.

Bhakta, I. and Sau, A., 2016. Prediction of depression among senior citizens using machine learning classifiers. International Journal of Computer Applications, 144(7), 11-16.

Zhang, W., Liu, H., Silenzio, V.M.B., Qiu, P. and Gong, W., 2020. Machine learning models for the prediction of postpartum depression: Application and comparison based on a cohort study. JMIR Medical Informatics, 8(4), DOI: 10.2196/15516.

Onan, A., Korukoğlu, S. and Bulut, H., 2016. A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Systems with Applications, 62, 1-16.

American Psychiatric Association, 2013. Diagnostic and Statistical Manual Psychiatric of Mental Disorder DSM-5. 5th ed. Arlington: American Psychiatric Association.

Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y. and Ohsaki, H., 2015. Recognizing depression from twitter activity. Conference on Human Factors in Computing Systems, Seoul, Korea, April 18-23, 2015, pp. 3187-3196.

Kang, K., Yoon, C. and Kim, E. Y., 2016. Identifying depressive users in Twitter using multimodal analysis. 2016 International Conference on Big Data and Smart Computing (BigComp), Hong Kong, China, January 18-20, 2016, pp. 231-238.

Burdisso, S.G., Errecalde, M. and Montes-y-Gómez, M., 2019. A text classification framework for simple and effective early depression detection over social media streams. Expert Systems with Applications, 133, 182-197.

Alabdulkreem, E., 2020. Prediction of depressed Arab women using their tweets. Journal of Decision Systems, 30(2-3), 102-117.

Aldarwish, M.M., and Ahmad, H.F., 2017. Predicting depression levels using social media posts. Proceedings of IEEE 13th International Symposium on Autonomous Decentralized Systems (ISADS), Bangkok, Thailand, March 22-24, 2017, pp. 277-280.

Dash, M. and Liu, H., 1997. Feature selection for classification. Intelligent Data Analysis, 1(3), 131-156.

Pintas, J.T., Fernandes, L.A.F., Cristina, A., and Garcia, B., 2021. Feature selection methods for text classification : a systematic literature review. Artificial Intelligence Review, 54, 6149-6200.

Rastogi, S., 2018. Improving classification accuracy of automated text classifiers. 7th International Conference on Reliability, Infocom Technologies and Optimization: Trends and Future Directions (ICRITO), Noida, India, August 29-31, 2018, pp. 239-245.

Zhu, L., Wang, G. and Zou, X., 2017. Improved information gain feature selection method for Chinese text classification based on word embedding. Proceedings of the 6th International Conference on Software and Computer Applications (ICSCA), Bangkok, Thailand, February 26-28, 2017, pp. 72-76.

Han, J., Kamber, M. and Pei, J., 2011. Data Mining: Concepts and Techniques. 3rd ed. Amsterdam: Elsevier.

Moonpen, U., Mungsing, S. and Banditwattanawong, T., 2021. Classification model development based on cluster-to-class distance mapping for tourism form prediction of Inbound tourism market in Thailand. Current Applied Science and Technology, 21(2), 393-407.

Nuankaew, P., Chaising, S. and Temdee, P., 2021. Average weighted objective distance-based method for type 2 diabetes prediction. IEEE Access, 9, 137015-137028.

Nuankaew, W. and Thongkam, J., 2020. Improving student academic performance prediction models using feature selection. 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Phuket, Thailand, June 24-27, 2020, pp. 392-395.

Dehzangi, A. and Karamizadeh, S., 2011. Solving protein fold prediction problem using fusion of heterogeneous classifiers. Information, 14(11), 3611-3621.

Rojarath, A. and Songpan, W., 2021. Cost-sensitive probability for weighted voting in an ensemble model for multi-class classification problems. Applied Intelligence, 51(7), 4908-4932.

Ndirangu, D., Mwangi, W. and Nderu, L., 2019. A hybrid ensemble method for multi-class classification and outlier detection. International Journal of Sciences: Basic and Applied Research (IJSBAR), 45(1), 192-213.

Hassan, S.U., Ahamed, J. and Ahmad, K., 2022. Analytics of machine learning-based algorithms for text classification. Sustainable Operations and Computers, 3, 238-248.