ABiLSTM with BERT Embedding for Classification of Imbalanced COVID-19 Rumors
Main Article Content
Abstract
The coronavirus emerged at the end of 2019 and has caused thousands of casualties all over the world. The pandemic has also been accompanied by loss of employment and economic down fall. Naturally, the pandemic and lack of knowledge of coronavirus has created public anxiety and panic. Nowadays, social medias like Twitter and Facebook and online news forum reach most people and have become popular channels of communication and information sharing. Unfortunately, these have become easy targets for rumors and fake news. The rapid flow of rumors and misleading information on the coronavirus over these online platforms has promoted public anxiety and fear. Consequently, the detection of rumors has become obligatory for economy and public safety. In this context, the present research focused on detecting and classifying rumors so that precautionary measures can be incorporated. Attention-based BiLSTM with BERT for rumor classification on the COVID-19 rumor dataset was proposed. The suggested classification model achieved an accuracy of 80.71% and a micro-F1 score of 90.85. Furthermore, the experimental outcomes affirm the superior efficacy of our proposed technique over existing methods.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Copyright Transfer Statement
The copyright of this article is transferred to Current Applied Science and Technology journal with effect if and when the article is accepted for publication. The copyright transfer covers the exclusive right to reproduce and distribute the article, including reprints, translations, photographic reproductions, electronic form (offline, online) or any other reproductions of similar nature.
The author warrants that this contribution is original and that he/she has full power to make this grant. The author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors.
Here is the link for download: Copyright transfer form.pdf
References
Alkhodair, S. A., Ding, S. H. H., Fung, B. C. M., & Liu, J. (2020). Detecting breaking news rumors of emerging topics in social media. Information Processing and Management, 57(2), Article 102018. https://doi.org/10.1016/j.ipm.2019.02.016
Aker, A., Sliwa, A., Dalvi, F. & Bontcheva, K. (2019). Rumour verification through recurring information and an inner-attention mechanism. Online Social Networks and Media, 13, Article 100045. https://doi.org/10.1016/j.osnem.2019.07.001
Akkaradamrongrat, S., Kachamas, P., & Sinthupinyo, S. (2019). Text generation for imbalanced text classification. In Proceedings of the 16th international joint conference on computer science and software engineering (pp. 181-186). IEEE. https://doi.org/10.1109/JCSSE.2019.8864181
Asghar, M. Z., Habib, A., Habib, A., Khan, A., Ali, R., & Khattak, A. (2021). Exploring deep neural networks for rumor detection. Journal of Ambient Intelligence and Humanized Computing, 12, 4315-4333. https://doi.org/10.1007/s12652-019-01527-4
Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20-29. https://doi.org/10.1145/1007730.1007735
Bourgonje, P., Schneider, J. M., & Rehm, G. (2017). From clickbait to fake news detection: an approach based on detecting the stance of headlines to articles. In Proceedings of the 2017 EMNLP workshop: Natural language processing meets journalism (pp. 84-89). Association for Computational Linguistics. https://doi.org/10.18653/v1/W17-4215
Bratu, S. (2020). The fake news sociology of COVID-19 pandemic fear: Dangerously inaccurate beliefs, emotional contagion, and conspiracy ideation. Linguistic and Philosophical Investigations, 19, 128-134. https://doi.org/10.22381/lpi19202010
Cao, W., Song, A., & Hu, J. (2017). Stacked residual recurrent neural network with word weight for text classification. IAENG International Journal of Computer Science, 44(3), 277-284.
Chawla, N. V. (2010). Data mining for imbalanced datasets: An overview. In M. Oded & R. Lior (Eds). Data mining and knowledge discovery handbook (2nd ed., pp. 875-886). Springer. https://doi.org/10.1007/978-0-387-09823-4_45
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953
Chawla, N. V., Lazarevic, A., Hall, L. O., & Bowyer, K. W. (2003). SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the 7th European conference on principles and practice of knowledge discovery in databases (pp. 107-119). Springer. https://doi.org/10.1007/978-3-540-39804-2_12
Chen, T., Xu, R., Liu, B., Lu, Q., & Xu, J. (2014). WEMOTE-Word embedding based minority oversampling technique for imbalanced emotion and sentiment classification. https://sentic.net/wisdom2014chen.pdf.
Cheng, M., Wang, S., Yan, X., Yang, T., Wang, W., Huang, Z., Xiao, X., Nazarian, S., & Bogdan, P. (2021). A COVID-19 rumor dataset. Frontiers in Psychology, 12, Article 644801. https://doi.org/10.3389/fpsyg.2021.644801
DiFonzo, N., & Bordia, P. (2007). Rumor psychology: Social and organizational approaches. Washington: American Psychological Association. https://doi.org/10.1037/11503-000
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International conference on knowledge discovery and data mining (pp. 226-231). Association for the Advancement of Artificial Intelligence.
Funahashi, K.-I., & Nakamura, Y. (1993). Approximation of dynamical systems by continuous time recurrent neural networks. Neural Networks, 6(6), 801-806. https://doi.org/10.1016/S0893-6080(05)80125-X
Guo, C., Lin, S., Huang, Z., & Yao, Y. (2022). Analysis of sentiment changes in online messages of depression patients before and during the COVID-19 epidemic based on BERT+ BiLSTM. Health Information Science and Systems, 10(1), Article 15. https://doi.org/10.1007/s13755-022-00184-w
Guo, H., Cao, J., Zhang, Y., Guo, J., & Li, J. (2018). Rumor detection with hierarchical social attention network. In Proceedings of the 27th ACM international conference on information and knowledge management (pp. 943-951). ACM. https://doi.org/10.1145/3269206.3271709
Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In Proceedings of the international conference on intelligent computing (pp. 878-887). Springer. https://doi.org/10.1007/11538059_91
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284. https://doi.org/10.1109/TKDE.2008.239
He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the international joint conference on neural networks (pp. 1322-1328). IEEE. https://doi.org/10.1109/IJCNN.2008.4633969
Hui, H., Zhou, C., Lü, X., & Li, J. (2020). Spread mechanism and control strategy of social network rumors under the influence of COVID-19. Nonlinear Dynamics, 101(3), 1933-1949. https://doi.org/10.1007/s11071-020-05842-w
Jin, Z., Cao, J., Guo, H., Zhang, Y., & Luo, J. (2017). Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In Proceedings of the 25th ACM international conference on multimedia (pp. 795-816). ACM. https://doi.org/10.1145/3123266.3123454
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980, https://doi.org/10.48550/arXiv.1412.6980
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234-1240. https://doi.org/10.1093/bioinformatics/btz682
Lee, Y. H., Kim, D. W., & Lim, M. T. (2019). A two-level recurrent neural network language model based on the continuous Bag-of-Words model for sentence classification. International Journal on Artificial Intelligence Tools, 28(1), Article 19500027. https://doi.org/10.1142/S0218213019500027
Li, F., Jin, Y., Liu, W., Rawat, B. P. S., Cai, P., & Yu, H. (2019). Fine-tuning bidirectional encoder representations from transformers (BERT)–based models on large-scale electronic health record notes: an empirical study. JMIR Medical Informatics, 7(3), Article e14830. https://doi.org/10.2196/14830
Long, Y., Lu, Q., Xiang, R., Li, M., & Huang, C.R. (2017). Fake news detection through multi-perspective speaker profiles. In Proceedings of the 8th international joint conference on natural language processing (pp. 252-256). Asian Federation of Natural Language Processing.
Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B. J., Wong, K. F., & Cha, M. (2016). Detecting rumors from microblogs with recurrent neural networks. In Proceedings of the 25th international joint conference on artificial intelligence (pp. 3818-3824). AAAI Press.
Mihaylov, T., & Nakov, P. (2016). Hunting for troll comments in news community forums. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (pp. 399-405). Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-2065
Mihaylov, T., Georgiev, G., & Nakov, P. (2015a). Finding opinion manipulation trolls in news community forums. In Proceedings of the 19th conference on computational natural language learning (pp. 310-314). Association for Computational Linguistics. https://doi.org/10.18653/v1/K15-1032.
Mihaylov, T., Koychev, I., Georgiev, G., & Nakov, P. (2015b). Exposing paid opinion manipulation trolls. In Proceedings of the international conference recent advances in natural language processing (pp. 443-450). INCOMA Ltd.
Mohammed, R., Rawashdeh, J., & Abdullah, M. (2020). Machine learning with oversampling and undersampling techniques: overview study and experimental results. In Proceedings of the 11th international conference on information and communication systems (pp. 243-248). IEEE. https://doi.org/10.1109/ICICS49469.2020.239556
Namee, K., Polpinij, J., & Luaphol, B. (2023). A hybrid approach for aspect-based sentiment analysis: A case study of hotel reviews. Current Applied Science and Technology, 23(2), https://doi.org/10.55003/cast.2022.02.23.008
Nguyen, T. N., Li, C., & Niederée, C. (2017). On early-stage debunking rumors on twitter: Leveraging the wisdom of weak learners. In Proceedings of the 9th international conference on social informatics (pp. 141-158). Springer. https://doi.org/10.1007/978-3-319-67256-4_13
Pham, T. T. (2018). A Study on Deep Learning for Fake News Detection. [Master’s Thesis, Japan Advanced Institute of Science and Technology]. JAIST. https://dspace.jaist.ac.jp/dspace/bitstream/10119/15196/3/paper.pdf
Qazvinian, V., Rosengren, E., Radev, D., & Mei, Q. (2011). Rumor has it: Identifying misinformation in microblogs. In Proceedings of the 2011 conference on empirical methods in natural language processing (pp. 1589-1599). Association for Computational Linguistics.
Sahni, H., & Sharma, H. (2020). Role of social media during the COVID-19 pandemic: Beneficial, destructive, or reconstructive? International Journal of Academic Medicine, 6(2), 70-75. https://doi.org/10.4103/IJAM.IJAM_50_20
Schmidhuber, J., & Hochreiter, S. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
Tripathy, R. M., Bagchi, A., & Mehta, S. (2010). A study of rumor control strategies on social networks. In Proceedings of the 19th ACM international conference on information and knowledge management (pp. 1817-1820). Association for Computing Machinery. https://doi.org/10.1145/1871437.1871737
Valentina, S., & Songpan, W. (2023). Features extraction based on probability weighting for fake news classification on social media. Current Applied Science and Technology, 23(2), 1-18. https://doi.org/10.55003/cast.2022.02.23.014
Vijeev, A., Mahapatra, A., Shyamkrishna, A., & Murthy, S. (2018). A hybrid approach to rumour detection in microblogging platforms. In Proceedings of the international conference on advances in computing, communications and informatics (pp. 337-342). IEEE. https://doi.org/10.1109/ICACCI.2018.8554371
Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146-1151. https://doi.org/10.1126/science.aap9559
Wu, K., Yang, S., & Zhu, K. Q. (2015). False rumors detection on Sina Weibo by propagation structures. In Proceedings of the 31st international conference on data engineering (pp. 651-662). IEEE.
Xu, X., Chen, W., & Sun, Y. (2019). Over-sampling algorithm for imbalanced data classification. Journal of Systems Engineering and Electronics, 30(6), 1182-1191. https://doi.org/10.21629/JSEE.2019.06.12
Yang, F., Liu, Y., Yu, X., & Yang, M. (2012). Automatic detection of rumor on sina weibo. In Proceedings of the ACM SIGKDD workshop on mining data semantics (pp. 1-7). Association for Computing Machinery. https://doi.org/10.1145/2350190.2350203
Yang, J., & Pan, Y. (2021). COVID-19 rumor detection on social networks based on content information and user response. Frontiers in Physics, 9, Article 763081. https://doi.org/10.3389/fphy.2021.763081
Zhao, Z., Resnick, P., & Mei, Q. (2015). Enquiring minds: Early detection of rumors in social media from enquiry posts. In Proceedings of the 24th international conference on world wide web (pp. 1395-1405). The Web Conference. https://doi.org/10.1145/2736277.2741637
Zubiaga, A., Liakata, M., & Procter, R. (2017). Exploiting context for rumour detection in social media. In Proceedings of the 9th international conference on social informatics (pp. 109-123). Springer. https://doi.org/10.1007/978-3-319-67217-5_8.