Evaluation and comparison of customer service performance generated by generative AI

Authors

  • Nattha Phiwma Faculty of Science and Technology Suan Dusit University Bangkok
  • Thannob Aribarg College of Digital Innovation Technology Rangsit University Pathum Thani

Keywords:

Generative Artificial Intelligence, Artificial Intelligence, ChatGPT, Customer Service, Large Language Models

Abstract

Generative AI has the ability to interact in a conversational manner and understand natural language, where there are a variety of programs available. If Generative AI is high-performance with the ability to answer questions similar to humans, it will impress users and be useful for customer service applications. Therefore, the objective of this research was to evaluate and compare the effectiveness of customer questioning-answering services generated by Generative AI models, namely ChatGPT, Claude, and Gemini. The research procedure involved preparing a dataset of questions and accurate answers, generating responses using Generative AI models (ChatGPT, Claude, and Gemini), and assessing these responses using the ROUGE-L algorithm and the evaluation of experts to compare the efficiency of each model's answers. ChatGPT, Claude, and Gemini were used to answer questions in two formats: the standard prompt format and a roleplay prompt format, with each question answered three times per format. The results of this study were as follows: 1) According to the standard format evaluation using the ROUGE-L algorithm, the answers generated by ChatGPT were most similar to the correct answers, with an average score of 0.26547. In the roleplay format, ChatGPT also produced answers most similar to the correct ones, with an average score of 0.29307; 2) Regarding the expert evaluations of the standard format responses, ChatGPT's answers were rated the closest to the correct ones, with an average score of 4.1556. In the roleplay prompt format, ChatGPT again scored the highest similarity, with an average of 4.2867.

References

นันท์นภัส ประจงการ. (2560). แนวทางการปรับใช้แชทบอทสำหรับงานบริการลูกค้า. การค้นคว้าอิสระสาขาวิชาวิทยาศาสตร์มหาบัณฑิต สาขาวิชาการบริหารการตลาด คณะพาณิชยศาสตร์และการบัญชี มหาวิทยาลัยธรรมศาสตร์.

ศรุตา จรูญเมธี. (2567). 5 เหตุผลที่ผู้ช่วย Generative AI จำเป็นกับงานบริการลูกค้า. ค้นเมื่อ 10 กันยายน 2567. https://www.amitysolutions.com/th/blogs/5-ways-Generative-ai-assistants-master-customer-service

สมพงษ์ อัศวริยธิปัติ และรชพร จันทร์สว่าง. (2566). ประสิทธิภาพระบบตอบรับอัตโนมัติที่ส่งผลต่อคุณภาพบริการด้านการสื่อสารของธุรกิจสายการบินที่ทำการบินภายในประเทศ. วารสารวิจัยรำไพพรรณี. 17(3): 77-89.

สุทธิภาส ชาญชัยประสิทธิ์ และวศิณ ชูประยูร. (2563). การประเมินประสิทธิภาพ ประสิทธิผล และความพึงพอใจในการใช้โปรแกรม Chatbot ธนาคารในประเทศไทย. วารสารรังสิตสารสนเทศ. 26(1): 117-136.

Boruah M. (2024). Chatbots vs. Humans: Which one should you choose and why?. Accessed 11 Oct. 2024. https://www.kommunicate.io/blog/chatbots-vs-humans/

Chowdhury N.M. (2023). Improving customer care with ChatGPT: A case study. Bachelor of Science Computer Science and Technology. Chongqing University of Posts and Telecommunications Chongqing China.

Gartner. (2024). Top strategic technology trends 2024: Advancing business value with AI trust and sustainability. Accessed 11 Oct. 2024. https://www.gartner.com/en/articles/gartner-top-10-strategic-technology-trends-for-2024

IBM. (2023). 5 types of chatbot and how to choose the right one for your business. Accessed 6 Jul. 2024. https://www.ibm.com/think/topics/chatbot-types

Lee, G.-G., Latif, E., Wu, X., Liu, N., and Zhai, X. (2024). Applying large language models and chain-of-thought for automatic scoring. Computers and Education: Artificial Intelligence, 6, 100213.

Lin C.Y. (2004). ROUGE: A Package for automatic evaluation of summaries. Text Summarization Branches Out. Barcelona. Spain.

Lins dos Santos, L.F. (2023). Evaluating and comparing generative-based chatbots based on process requirements. Master of Mathematics Thesis. Computer Science, University of Waterloo.

McKinsey and Company. (2023). The state of AI in 2023: Generative AI's breakout year. McKinsey Digital. Accessed 11 Oct. 2024. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-in-2023-Generative-ais-breakout-year

MIT Technology Review. (2019). The AI issue: The state of artificial intelligence. Massachusetts Institute of Technology. Accessed 6 Sep. 2024. https://www.technologyreview.com/2019/01/08/137912/the-state-of-artificial-intelligence/

Poonpresartporn A., Uttaranakorn P. and Chuaybamroong R. (2023). "Comparing the accuracy and referencing of ChatGPT’s responses to herbal medicine queries: A zero-shot versus roleplay prompting approach. The Thai Journal of Pharmaceutical Sciences. 47(4): 1-11.

Stryker C. and Scapicchio M. (2024). What is Generative AI. Accessed 6 Sep. 2024. https://www.ibm.com/topics/Generative-ai

Subagja A.D., Ausat A.MA., Sari A.R., Wanof M.I and Suherlan. (2023). Improving customer service quality in MSMEs through the use of ChatGPT. Jurnal Minfo Polgan. 12(2): 380-386.

World Economic Forum. (2023). The future of jobs report 2023. World Economic Forum. Accessed 6 Sep. 2024. https://www.weforum.org/publications/the-future-of-jobs-report-2023/

Wu Y., Henriksson A., Duneld M. and Nouri J. (2023). Towards improving the reliability and transparency of chatgpt for educational question answering. In European Conference on Technology Enhanced Learning 28 August 2023. Switzerland. 475-488.

Yun J. and Park J. (2022). The effects of chatbot service recovery with emotion words on customer satisfaction, repurchase intention, and positive word-of-mouth. Accessed 6 Sep. 2024. https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2022.922503/full

Published

2025-04-09

How to Cite

Phiwma, N., & Aribarg, T. . (2025). Evaluation and comparison of customer service performance generated by generative AI . Agriculture & Technology RMUTI Journal, 6(1), 125–136. retrieved from https://li01.tci-thaijo.org/index.php/atj/article/view/265449