Classification Model Development Based on Cluster-to-Class Distance Mapping for Tourism Form Prediction of Inbound Tourism Market in Thailand

Main Article Content

Unnadathorn Moonpen
Surasak Mungsing
Thepparit Banditwattanawong*


This paper describes the classification model development of inbound tourism form in Thailand. The models utilized both labeled and originally unlabeled data sets. The latter data set, which was obtained from the Ministry of Tourism and Sports of Thailand that regularly collects unlabeled data, mandated the synthesis of tourism form labels to be usable for classification. To achieve such a label synthesis, we proposed a cluster-to-class mapping algorithm that consisted of three steps. First, searching the best tourist clustering model among the unlabeled tourist data set by comparing the results of K-means, hierarchical cluster analysis, random clustering, and DBSCAN techniques. Second, mapping the clusters to the classes of the labeled data set based on Euclidean similarity to reveal the tourism form labels for the clusters. Finally, searching the best tourism-form classification model based on the data sets with real and synthesized labels by engaging Naïve Bayes, support vector machine, linear regression, and decision tree techniques. Experimental results show that our algorithm effectively generated the tourism form labels since, when using them, we obtained a neutral network model that was capable of predicting the inbound tourism forms of an unseen tourist data set with an F-measure value as high as 98.99%.


Keywords:  tourism form; classification algorithm; clustering algorithm; cluster-to-class mapping

*Corresponding author: Tel:  +66(0)2942 8200-45



Download data is not yet available.

Article Details

Research Articles


[1] Economic Army of Tourism and Sport, 2020. Thailand tourism situation. Tourism
Economic Review, 2(1), 11-15.
[2] Office of the National Economic and Social Development Council (NESDC), 2020.
NESDC ECONOMC REPORT. [Online] Available at:
[3] Department of Tourism, 2018. The Tourism Development Strategic Plan (2018-2021). Bangkok: Department of Tourism.
[4] Tourism Authority of Thailand, 2020. The travel trends to know in 2020. TAT Review,
6(1), 17-26.
[5] Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M. and Euler, T., 2006. YALE: Rapid prototyping for complex data mining tasks. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA., 2006, 935-940.
[6] Panawong, N., Namahoot, C.S. and Brückner, M., 2014. Classification of tourism web with modified Naïve Bayes algorithm. Advanced Materials Research, 931-932, 1360-1364.
[7] Chatcharaporn, K., Angskun, J. and Angskun, T., 2014. Tourist attraction categorization using a latent semantic analysis and machine learning techniques. Information, 17, 2683-2698.
[8] Srivihok, A. and Yotsawat, W., 2014. Market segmentation of inbound business tourists to Thailand by binding of unsupervised and supervised learning techniques. Journal of Software, 9(5), 1334-1341.
[9] Liu, Q., Ge, Y., Li, Z., Chen, E. and Xiong, H., 2011. Personalized travel package recommendation. IEEE 11th International Conference on Data Mining, December, 2011, 407-416.
[10] Oender, I., 2017. Classifying multi-destination trips in Austria with big data. Tourism Management Perspectives, 21, 54-58.
[11] Zhu, L.N., 2017. Empirical analysis of tourism resources evaluation and promotion based on data mining neural network. Revista de la Facultad de Ingeniería UCV, 32(2), 385-389.
[12] Hayamin, P. and Srivihok, A., 2018. Segmentation of domestic tourist in Thailand by combining attribute weight with clustering algorithm. Journal of Advances in Information Technology, 9(2), 39-44.
[13] Cufoglu, A., 2014. User profiling- a short review. International Journal of Computer Applications, 108(3), 1-9.
[14] Rodríguez, J., Semanjski, I., Gautama, S., de Weghe, N.V. and Ochoa, D., 2018. Unsupervised hierarchical clustering approach for tourism market segmentation based on crowd sourced mobile phone data. Sensors, 18(9),
[15] Department of Tourism in Thailand, 2018. Tourism Forms. [online] Available at:
[16] RStudio Team, 2020. RStudio: Integrated Development for R. [online] Available at:
[17] RapidMiner, 2014. RapidMiner Studio Manual. [online] Available at: https://docs.rapid
[18] Theeramunkong, T., 2017. Introduction to Concepts and Techniques in Data Mining and Application to Text Mining. 2rd ed. Bangkok: Thammasat University Press.
[19] Jane, E.M. and Raj, E.G.D.P., 2018. Comparative study on partition based clustering algorithms. International Journal of Research in Advent Technology, 6(9), 2398-2403.
[20] Han, J. and Kamber, M., 2006. Data Mining: Concepts and Techniques. 2nd ed., Illinois: University of Illinois at Urbana-Champaign.
[21] Gates, A.J. and Ahn, Y.-Y., 2017. The impact of random models on clustering similarity. Journal of Machine Learning Research, 18, 1-28.
[22] Sibuya, M., 1993. A random clustering process. Annals of the Institute of Statistical Mathematics, 45(3), 459-465.
[23] Tran, T.N., Drab, K. and Daszykowski, M., 2013. Revised DBSCAN algorithm to cluster data with dense adjacent clusters. Chemometrics and Intelligent Laboratory Systems, 120, 92-96.
[24] Mary, A.V.A. and Jebarajan, T., 2014. Performance metrics of clustering algorithm. Indian Journal of Applied Research, 4(8), 165-167.
[25] Witten, I.H., Frank, E. and Hall, M.A., 2005. Data Mining. Practical Machine Learning Tools and Techniques. San Francisco: Morgan Kaufmann.
[26] Kumar, V., Chhabra, J.K. and Kumar, D., 2014. Performance evaluation of distance metrics in the clustering algorithms. INFOCOMP Journal of Computer Science, 13(1), 38-52.
[27] Mitchell, T.M, 1997. Machine Learning. New York: McGraw-Hill.