Classification of Human Emotion from Speech Recognition Using Deep Learning

Main Article Content

Asst. Prof. Dr. Sarunya Kanjanawattana
Atsadayoot Jarat
Dr. Panchalee Praneetpholkrang

Abstract

Human emotions are complex mental processes that respond to surrounding stimulators. It is a mechanism that allows humans to adjust themselves and express their emotions in various situations. In a particular situation, humans can manifest their emotions diversely. Therefore, it is difficult to catch and understand the actual emotions. Predicting the interlocutors’ emotions help to decide proper actions for specific situations such as for treating patients with depression or those who need psychotherapy. This study develops deep learning models to classify human emotions by using human speech. Then, humans’ voices are classified based on five emotional types including normal emotion, anger, surprise, happiness, and sadness. The objectives of this study are to 1) compare the performances of two classifying models i.e., Convolution Neuron Networks (CNN), and Long Short-Term Memory (LSTM), and 2) propose the most appropriate model for classifying humans’ emotions from speech recognition. It reveals that classification results generated by LSTM outperform CNN. With LSTM, there are four classes to recognize humans’ speech emotions such as normal, angry, surprised, happy, and sad.

Article Details

How to Cite
กาญจนวัฒนา ศ., Jarat, A., & Praneetpholkrang, P. (2022). Classification of Human Emotion from Speech Recognition Using Deep Learning. Journal of SciTech-ASEAN, 2(2), 1–11. retrieved from https://li01.tci-thaijo.org/index.php/STJS/article/view/253403
Section
Research Article

References

Hammond, M. (2006). Evolutionary theory and emotions. In Stets E.J. and Turnur J.H. Handbook of the Sociology of Emotions, New York: Springer, 368–385.

Tokuno, S., Tsumatori, G., Shono, S., Takei, E., Yamamoto, T., Suzuki, G., Mituyoshi, S. and Shimura M. (2001). Usage of emotion recognition in military health care. In 2011 Defense Science Research Conference and Expo (DSR), 1–5.

Yamashita, Y., Onodera, M., Shimoda, K. and Tobe, Y. (2019). Visualizing health with emotion polarity history using voice. In Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, 1210–1213.

Kittichaiwatthana, P., Praneetpholkrang, P., and Kanjanawattana, S. (2020). Facial Expression Recognition using Deep Learning. SUT International Virtual Conference on Science and Technology, 41.

Song, I., Kim, HJ. and Jeon, P. (2014). Deep learning for real-time robust facial expression recognition on a smartphone. In 2014 IEEE International Conference on Consumer Electronics (ICCE), 564–567.

Dagar, D., Hudait, A., Tripathy, HK. and Das, MN. (2016). Automatic emotion detection model from facial expression. In 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT), 77–85.

Lugović, S., Dunder, I. and Horvat, M. (2016). Techniques and applications of emotion recognition in speech. In 2016 39th international convention on information and communication technology, electronics and microelectronics (mipro), 278–1283.

Xie, Y., Liang, R., Liang, Z., Huang, C., Zou, C. and Schuller, B. (2019). Speech emotion classification using attention-based LSTM. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(11), 1675–1685.

Shewalkar, AN. (2018). Comparison of rnn, lstm and gru on speech recognition data. In Partial Fulfillment of the Requirements for the Degree of Master of Science. North Dakota State University of Agriculture and Applied Science.

Rawat, W. & Wang, Z. (2017). Deep convolutional neural networks for image classification: A comprehensive review. Neural computation, 29(9), 2352–2449.

Etienne, C., Fidanza, G., Petrovskii, A., Devillers, L. and Schmauch, B. (2018). Cnn+ lstm architecture for speech emotion recognition with data augmentation. In Proceeding Workshop on Speech, Music and Mind (SMM 2018), 21-25.

Zhao, J., Mao, X. and Chen, L. (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical signal processing and control, 47, 312–323.