Detecting Fraud Job Recruitment Using Features Reflecting from Real-world Knowledge of Fraud

Main Article Content

Boonthida Chiraratanasopha*
Thodsaporn Chay-intr

Abstract

A common method for text-analysis and text-based classification is to process for term-frequency or patterns of terms. However, these features alone may not be able to differentiate fake and authentic job advertisements. Thus, in this work, we proposed a method to detect fake job recruitments using a novel set of features designed to reflect the behavior of fraudsters who present fake information. The features were missing information, exaggeration, and credibility. The features were designed to represent in the form of a category and an automatically generatable score of readability. Data from EMSCAD dataset were transformed in accordance with the designed features and used to train a detection model for fake job detection. The experimental results showed that the model from the designed features performed better than those based on the term-frequency approach in every applied machine learning technique.  The proposed method yielded 97.64% accuracy, 0.97 precision and 0.99 recall score for its best model when used for classifying fake job advertisements.


Keywords: fake job advertisement; internet fraud; feature design; fraud detection


*Corresponding author: Tel.: (+66) 843133015


                                             E-mail: [email protected]

Article Details

Section
Original Research Articles

References

Fan, Q., 2015. The types, characteristics and countermeasures of internet fraud crime. Proceedings of the International Scientific Conference “Archibald Reiss Days”, Belgrade, Serbia, March 3-4, 2015, pp. 315-319.

Eze-Michael, E., 2021. Internet fraud and its effect on NIGERIA’s image in international relations. Covenant Journal of Business and Social Sciences, 11(3), 1-25.

Ye, N., Cheng, L. and Zhao, Y., 2019. Identity construction of suspects in telecom and internet fraud discourse: from a sociosemiotic perspective. Social Semiotics, 29(3), 319-335.

Norris, G., Brookes, A. and Dowell, D., 2019. The psychology of internet fraud victimization: A systematic review. Journal of Police and Criminal Psychology, 34(3), 231-245.

Huang, Z., 2017. Causes and prevention of telecommunication network fraud. Proceedings of the 2nd International Conference on Humanities Science and Society Development (ICHSSD 2017), Xiamen, China, November 18-19, 2017, pp. 164-173.

Galbraith, M.L., 2012. Identity crisis: Seeking a unified approach to plaintiff standing for data security breaches of sensitive personal information. American University Law Review, 62, 1365-1397.

Alqatawna, J., Faris, H., Jaradat, K., Al-Zewairi, M. and Adwan, O., 2015. Improving knowledge based spam detection methods: The effect of malicious related features in imbalance data distribution. International Journal of Communications, Network and System Sciences, 8(5), 118-129.

Zareapoor, M. and Seeja, K.R., 2015. Feature extraction or feature selection for text classification: A case study on phishing email detection. International Journal of Information Engineering and Electronic Business, 7(2), 60-65.

Vidros, S., Kolias, C., Kambourakis, G. and Akoglu, L., 2017. Automatic detection of online recruitment frauds: Characteristics, methods, and a public dataset. Future Internet, 9(6), https://doi.org/10.3390/fi9010006.

Nasser, I. and Alzaanin, A.H., 2020. Machine learning and job posting classification: A comparative study. International Journal of Engineering and Information Systems, 4(9), 6-14.

Dutta, S. and Bandyopadhyay, S.K., 2020. Fake job recruitment detection using machine learning approach. International Journal of Engineering Trends and Technology, 68(4), 48-53.

Mahbub, S. and Pardede, E., 2018. Using contextual features for online recruitment fraud detection. Proceedings of the 27th International Conference on Information Systems Development, Lund, Sweden, August 22-24, 2018, p. 60.

Alghamdi, B. and Alharby, F., 2019. An intelligent model for online recruitment fraud detection. Journal of Information Security, 10(03), 155-176.

Shukla, Y., Yadav, N. and Hari, A., 2019. A unique approach for detection of fake news using machine learning. International Journal for Research in Applied Science and Engineering Technology, 7(VI), https://doi.org/10.22214/ijraset.2019.6087.

Akinyemi, B., Adewusi, O. and Oyebade, A., 2020. An improved classification model for fake news detection in social media. International Journal of Information Technology and Computer Science, 12(1), 34-43.

Elmurngi, E. and Gherbi, A., 2017. An empirical study on detecting fake reviews using machine learning techniques. Proceedings of the 7th International Conference on Innovative Computing Technology (INTECH 2017), Luton, UK, August 16-18, 2017, pp. 107-114.

Bansal, S., 2020. Real/Fake Job Posting Prediction. [online] Available at: https://www.kaggle.com/shivamb/real-or-fake-fakejobposting-prediction.

Kanungo, T. and Orr, D., 2009. Predicting the readability of short web summaries. Proceedings of the Second ACM International Conference on Web Search and Data Mining, Barcelona, Spain, February 9-12, 2009, pp. 202-211.

Tsai, Y., 2010. Text analysis of patent abstracts. The Journal of Specialized Translation, 13, 61-80.

Fabian, B., Ermakova, T. and Lentz, T., 2017. Large-scale readability analysis of privacy policies. Proceedings of the International Conference on Web Intelligence, Leipzig, Germany, August 23-26, 2017, pp. 18-25.

Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J. and Stein, B., 2017. A stylometric inquiry into hyperpartisan and fake news. arXiv preprint, arXiv:1702.05638.

Martinc, M., Pollak, S. and Robnik-Šikonja, M., 2021. Supervised and unsupervised neural approaches to text readability. Computational Linguistics, 47(1), 141-179.

Sáez, J.A., Luengo, J., Stefanowski, J. and Herrera, F., 2015. SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Information Sciences, 291, 184-203.

Gu, Q., Wang, X.M., Wu, Z., Ning, B. and Xin, C.S., 2016. An improved SMOTE algorithm based on genetic algorithm for imbalanced data classification. Journal of Digital Information Management, 14(2), 92-103.

Basgall, M.J., Hasperué, W., Naiouf, M., Fernández, A. and Herrera, F., 2018. Smote-bd: An exact and scalable oversampling method for imbalanced classification in big data. Proceedings of the VI Jornadas de Cloud Computing and Big Data (JCC&BD 2018), Buenos Aires, Argentina, June 25-29, 2018, pp. 23-18.

Fernández, A., Garcia, S., Herrera, F. and Chawla, N.V., 2018. SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of Artificial Intelligence Research, 61, 863-905.

Yan, Y., Liu, R., Ding, Z., Du, X., Chen, J. and Zhang, Y., 2019. A parameter-free cleaning method for SMOTE in imbalanced classification. IEEE Access, 7, 23537-23548.

Rajpura, H.R. and Diwanji, H., 2013. Enhancement of fake website detection techniques using feature selection and filtering algorithms. International Journal of Advanced Research in Computer Science, 4(3), 132-137.

Patel, R. and Thakkar, P., 2014. Opinion spam detection using feature selection. Proceedings of the 2014 International Conference on Computational Intelligence and Communication Networks, Bhopal, India, November 14-16, 2014, pp. 560-564.

Joshi, A., Pattanshetti, P. and Tanuja, R., 2019. Phishing attack detection using feature selection techniques. Proceedings of the International Conference on Communication and Information Processing (ICCIP), Chongqing, China, November 15-17, 2019, pp. 1-7.

Shabudin, S., Sani, N.S., Ariffin, K.A.Z. and Aliff, M., 2020. Feature selection for phishing website classification. International Journal of Advanced Computer Science and Applications, 11(4), 587-595.

Stahl, K., 2018. Fake news detection in social media. California State University Stanislaus, 6, 4-15.

Zhou, X., Jain, A., Phoha, V.V. and Zafarani, R., 2020. Fake news early detection: A theory-driven model. Digital Threats: Research and Practice, 1(2), 1-25.