The Development of a Semantic-based Image Retrieval Model by Pre-training Neural Network

Main Article Content

Chakkarin Santirattanaphakdi
Suphakit Niwattanakul


This research aims to develop a semantic-based image retrieval model applying the Contrastive Language-Image Pre-training (CLIP) model. Evaluation of image retrieval performance with precision, recall and f-measure, it was found that image search results with the query by global labels condition and the query by high level concepts of the images condition had a very good level of precision, the model can efficiently retrieve images from the content. However, image retrieval results with the query by qualitative semantic concepts of the image condition, despite having a good level of precision. But the results are far from the user's expectations because the semantic of image is interpreted   by experience on human perception principles. In addition, also, the semantic of image is difficult to evaluate whether they are correct or not. The output from this research can resolve the semantic gap problem and support users by query within a natural language that attaches to the semantic of the image rather than the grammar of the language. This impact of results in a guideline for semantic information retrieval in the future.

Article Details

How to Cite
Santirattanaphakdi, C., & Niwattanakul, S. (2023). The Development of a Semantic-based Image Retrieval Model by Pre-training Neural Network. Journal of Science Ladkrabang, 32(2), 80–96. Retrieved from
Research article


Broz, M. 2023. Number of Photos (2023): Statistics, Facts, & Predictions. Available at: Retrieved 29 March 2023.

AbdElrazek, E.E. 2017. A Comparative Study of Image Retrieval Algorithms for Enhancing a Content-based Image Retrieval System. Global Journal of Computer Science and Technology: (F) Graphics & Vision, 17(3), 1-9.

Tyagi, V. 2017. Content-Based Image Retrieval Ideas, Influences, and Current Trends. Springer Nature, Springer Nature Singapore Pte Ltd.

Liu, Y., Huang, Y., Zhang, S., Zhang, D. and Ling, N. 2017. Integrating object ontology and region semantic template for crime scene investigation image retrieval. Proceedings 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), Siem Reap, 49-153.

Alkhawlani, M., Elmogy, M. and Elbakry, H. 2015. Content-Based Image Retrieval using Local Features Descriptors and Bag-of-Visual Words. International Journal of Advanced Computer Science and Applications, 6(9), 212-219.

Barz, B. 2020. Semantic and Interactive Content-based Image Retrieval. Ph.D. Thesis, University of Jena.

Goodfellow, I., Bengio, Y. and Courville, A. 2016. Deep Learning. MIT Press, Cambridge, Massachusetts Institute of Technology.

Aggarwal, C.C. 2018. Neural Networks and Deep Learning A Textbook. Springer Nature, Springer International Publishing.

Caicedo, J.C., Gonzalez, F.A. and Romero, E. 2008. A Semantic Content-Based Retrieval Method for Histopathology Images. Proceedings 4th Asia Information Retrieval Symposium (AIRS 2008), Harbin, 51-60.

Khodaskar, A. and Ladhake, S. 2015. Semantic Image Analysis for Intelligent Image Retrieval. Procedia Computer Science, 48(2015), 192 – 197.

Thanh, L.M., Nhi, N.T.U., Thi, N.T.U., Han, P.T.A. and Thanh, V. T. 2020. A Semantic-Based Image Retrieval System Using A Hybrid Method K-Means And K-Nearest-Neighbor. Annales Univ. Sci. Budapest., Sec. Comp., 51, 253-274.

Nhi, N.T.U., Le, T.M. and Van, T.T. 2022. A Model of Semantic-Based Image Retrieval Using C-Tree and Neighbor Graph. International journal on Semantic Web and information systems, 18(1), 1-23.

Barz, B. and Denzler, J. 2020. Content-based Image Retrieval and the Semantic Gap in the Deep Learning Era. Proceedings International Workshop on Content-Based Image Retrieval: where have we been, and where are we going (CBIR 2020), Milan, 2 - 19.

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G. and Sutskever, I. 2021. Learning Transferable Visual Models From Natural Language Supervision. Proceedings 38th International Conference on Machine Learning, Virtual Event, 8748-8763.

McConnell, S. 1996. Rapid Development: Taming Wild Software Schedules. Microsoft Press, Microsoft.

Mu, N., Kirillov, A., Wagner, D. and Xie, S. 2021. SLIP: Self-supervision meets Language-Image Pre-training. Available at: Retrieved 16 September 2023.

Lu, J., Batra, D., Parikh, D. and Lee, S. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Task. Available at: Retrieved 16 September 2023.

Li, G., Duan, N., Fang, Y., Gong, M., Jiang, D. and Zhou, M. 2019. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training. Available at: Retrieved 16 September 2023.

Chen, Y., Li, L., Yu, L., Kholy, A.E. Ahmed, F., Gan, Z. Cheng, Y. and Liu, J. 2019. UNITER: UNiversal Image-TExt Representation Learning. Available at: 1909.11740.pdf. Retrieved 16 September 2023.

Sommerville, L. 2015. Software Engineering. 10th ed. Pearson Education, Courier Westford.

Chaudhary, A. 2022. Evaluation Metrics For Information Retrieval. Available at: Retrieved 5 April 2023.

Sarwara, S., Qayyuma, Z.U. and Majeedb, S. 2013. Ontology Based Image Retrieval Framework using Qualitative Semantic Image Description. Procedia Computer Science, 22(2013), 285–294.

เรวัต แสงสุริยงค์. 2565. ความเสี่ยงของการเกิดความคลาดเคลื่อนในการวิจัยเชิงปริมาณด้านสังคมวิทยา. วารสารวิชาการมนุษยศาสตร์และสังคมศาสตร์ มหาวิทยาลัยบูรพา, 30(1), 158-185. [Rewat Sangsuriyong. 2022. Risks of Error in the Quantitative Sociology Research. Journal of Humanities and Social Sciences Burapha University, 30(1), 158-185. (in Thai)]

อรนุช ศรีสะอาด. 2561. การตรวจสอบความเที่ยงตรงของเครื่องมือวัดผลโดยผู้เชี่ยวชาญ. วารสารการวัดผลการศึกษา มหาวิทยาลัยมหาสารคาม, 1(1), 45-49. [Oranuch Srisa-ard. 2018. Validation of Measurement and Evaluation Tools by Expert. Journal of Educational Measurement Mahasarakham University, 1(1), 45-49. (in Thai)]

Gilani, R. 2020. Main Challenges in Image Classification. Available at: Retrieved 29 March 2023.

Dix, A., Finlay, J., Abowd, G.D. and Beale, R. (2004). Human–Computer Interaction. 3rd ed. Pearson Education, Scotprint Book Printers Ltd.

Liu, C. and Song, G. 2011. A Method of Measuring the Semantic Gap in Image Retrieval: Using the Information Theory. Proceedings 2011 International Conference on Image Analysis and Signal Processing (IASP 2011), Hubei, 1-5.