BM5-SP-SC: A Dual Model Architecture for Contradiction Detection on Crowdfunding Projects

Main Article Content

Wenting Hou
Jian Qu*

Abstract

Despite the prevalence of scams in crowdfunding projects, currently, there is little research into the identification of fraudulent or infeasible crowdfunding projects. Since detecting fraudulent crowdfunding projects is challenging, most existing research on fake information has focused on detecting fake news or fake charity crowdfunding projects based on social media, but research on fraudulent or infeasible crowdfunding projects is very limited. Therefore, to solve this problem, we focus on how to detect fraudulent crowdfunding projects based on knowledge extraction and contradiction detection. We proposed a novel method called BM5-SP-SC (BERT-MT5-Sentence Pattern-Sentiment Classification). BM5 (BERT-MT5), which is built from a combination of a key-BERT and a fine-tuned MT5 transformers, was used to extract feature information from crowdfunding projects. We proposed a novel method for MT5 training to construct an adaptive BM5 model. The correct rate of keywords extracted by our novel adaptive BM5 model was up to 72.7%, the recall was 100%, and the F-measure was up to 84.19%. The minimum train loss of the BM5 model was up to 0.1342, and the evaluation loss achieved was 0.3064. The BLEU score of summary-to-keyword was 37.336. Moreover, we proposed an SP (Sentence Pattern) matching method to achieve knowledge extraction. Furthermore, SC (Sentiment Classification) was used to build a sentiment classifier thesaurus for identifying fraudulent and infeasible crowdfunding projects. Our proposed BM5-SP-SC achieved an overall accuracy of 85.26% in detecting fraudulent crowdfunding projects.


Keywords: crowdfunding projects; contradiction detection; knowledge extraction; BERT; MT5; feature information extraction


*Corresponding author: Tel.: (+66) 863759307


                                             E-mail: [email protected]

Article Details

Section
Original Research Articles

References

CCTV, 2015. How Does Crowdfunding Not "Crowd-sorry"? Crowdfunding Failure Case Studies. [online] Avaiable at: https://www.weiyangx.com/122711.html.

Medium, 2019. We Analyzed 331.000 Kickstarter Projects. Here´s What We Learned About Kickstarter Success. [online] Avaiable at: https://medium.com/@daniel.kupka.

Wang, W.Y., 2017. “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, July 30-August 4, 2017, pp. 422-426.

De Marneffe, M.C., Rafferty, A.N. and Manning, C.D., 2008. Finding contradictions in text. Proceedings of ACL-08: HLT, Columbus, Ohio, USA, June 19, 2008, pp. 1039-1047.

Perez, B., Machado, S., Andrews, J. and Kourtellis, N., 2022. I Call BS: Fraud detection in crowdfunding campaigns. 14th ACM Web Science Conference, Barcelona, Spain, June 26-29, 2022, pp. 1-11.

Yu, Y.W. and Kim, H.G., 2020. Interactive morphological analysis to improve accuracy of keyword extraction based on cohesion scoring. Journal of the Korea Society of Computer and Information, 25(12), 145-153.

Banawan, K. and Ulukus, S., 2018. The capacity of private information retrieval from coded databases. IEEE Transactions on Information Theory, 64(3), 1945-1956.

Kim, S.W. and Gil, J.M., 2019. Research paper classification systems based on TF-IDF and LDA schemes. Human-centric Computing and Information Sciences, 9(1), 1-21.

Yao, L., Pengzhou, Z. and Chi, Z., 2019. Research on news keyword extraction technology based on TF-IDF and TextRank. 2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS), Beijing, China, June 17-19, 2019, pp. 452-455.

Li, W. and Zhao, J., 2016. TextRank algorithm by exploiting Wikipedia for short text keywords extraction. 2016 3rd International Conference on Information Science and Control Engineering (ICISCE), Beijing, China, July 8-10, 2016, pp. 683-686.

Schmitt, X., Kubler, S., Robert, J., Papadakis, M. and LeTraon, Y., 2019. A replicable comparison study of NER software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate. 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Granada, Spain, October 22-25, 2019, pp. 338-343.

Contreras, J.O., Hilles, S. and Abubakar, Z.B., 2018. Automated essay scoring with ontology based on text mining and nltk tools. 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), Shah Alam, Malaysia, July 11-12, 2018, pp. 1-6.

Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C. and Jatowt, A., 2020. YAKE! Keyword extraction from single documents using multiple local features. Information Sciences, 509(1), 257-289.

Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C. and Jatowt, A., 2018. A text feature based automatic keyword extraction method for single documents. European Conference on Information Retrieval, Grenoble, France, March 26-29, 2018, pp. 684-691.

Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C. and Jatowt, A., 2018. Yake! collection-independent automatic keyword extractor. European Conference on Information Retrieval, Grenoble, France, March 26-29, 2018, pp. 806-810.

Bowman, C.M., Danzig, P.B., Hardy, D.R., Manber, U. and Schwartz, M.F., 1995. The harvest information discovery and access system. Computer Networks and ISDN Systems, 28(1-2), 119-125.

Gotz, D., When, Z., Lu, J., Kissa, P., Cao, N., Qian, W.H. and Zhou, M.X., 2010. Harvest: an intelligent visual analytic tool for the masses. Proceedings of the First International Workshop on Intelligent Visual Interfaces for Text Analysis, Hong Kong, China, February 7-10, 2010, pp. 1-4.

Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, USA, June 2-7, 2019, pp. 4171-4186.

Sun, Y., Yang, D., Yu, T., Dong, A. and Yong, C., 2022. A study of BERT-based bi-directional Tibetan-Chinese neural machine translation. International Conference on Computer, Artificial Intelligence, and Control Engineering (CAICE2022), Zhuhai, China, December 2, 2022, pp. 208-212.

Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A. and Raffel, C., 2021. mT5: A massively multilingual pre-trained text-to-text transformer. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Online, June 6-11, 2021, pp. 483-498.

Qu, J., Nguyen, L.M. and Shimazu, A., 2016. Cross-language information extraction and auto evaluation for OOV term translations. IEEE Transaction on China Communications, 13(12), 277-296.

Qu, J., Theeramunkong, T., Nguyen, L.M., Shimazu, A., Nattee, C. and Aimmanee, P., 2012. A flexible rule-based approach to Learn Medical English-Chinese OOV term translations from the web. International Journal of Computer Processing of Languages, 24(2), 207-236.

Shu, K., Sliva, A., Wang, S., Tang, J. and Liu, H., 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1), 22-36.

Wenting, H. and Qu, J., 2022. Comparison of keyword extraction methods for crowdfunding projects based on web-data. International Scientific Journal of Engineering and Technology, 6(2), 1-12.

Papineni, K., Roukos, S., Ward, T. and Zhu, W.-J., 2002. Bleu: a method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, USA., July 7-12, 2002, pp. 311-318.