Improving Multi-label Classification Using Feature Reconstruction Methods

Main Article Content

Worawith Sangkatip
Phatthanaphong Chomphuwiset*

Abstract

Multi-label classification (MLC) is a supervised classification method that allows for a data instance with more than one class label (or target). Solving MLC is still a challenging task. MLC can potentially generate complex decision boundaries as the method is a non-mutual exclusive classification method. Recently, many techniques have been proposed to cope with the complexity of MLC problems, such as the Problem transform method (PTM), the Adaptation method (AM), and the Ensemble method (EM). These techniques can generally produce good results with certain datasets. However, they have poor classification performance when the number of possible class-labels is larger, even if the dataset is well-presented (high density). The aim of this work was to solve the MLC problems by performing a feature reconstruction process on the original data features. The proposed feature reconstruction method generates a set of compact features from the original data instances. AutoEncoder is deployed to learn and encode the features of the data (as the constructed feature steps) before they are classified by learning algorithms (or classifiers). We conducted experiments using different multi-label classifiers based on and around PTM, AM, and EM, on the set of the standard dataset. The results from the experiments demonstrated that the proposed feature reconstruction technique provides promising classification results, especially with high-density data.


Keywords: multi-label classification; multi-label feature transformation; feature engineering; high density data; feature reconstruction


*Corresponding author: E-mail: [email protected]

Article Details

Section
Original Research Articles

References

Chandran, S.A. and Panicker, J.R., 2017. An efficient multi-label classification system using ensemble of classifiers. Proceeding of International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), Kerala, India, July 6-7, 2017, pp. 1133-1136.

Prajapati, P. and Thakkar, A., 2021. Performance improvement of extreme multi-label classification using K-way tree construction with parallel clustering algorithm. Journal of King Saud University - Computer and Information Sciences, DOI: 10.1016/j.jksuci.2021.02.014.

Bogatinovski, J., Todorovski, L., Dzeroski, S. and Kocev, D., 2021. Comprehensive Comparative Study of Multi-label Classification Methods. [online] Available at: https://arxiv. org/pdf/2102.07113.pdf.

Alazaidah, R. and Ahmad, F.K., 2016. Trending challenges in multi label classification. Journal of Advanced Computer Science and Applications, 7, DOI: 10.14569/IJACSA.2016. 071017.

Pushpa, M. and Karpagavalli, S., 2017. Multi-label classification: Problem transformation methods in Tamil Phoneme classification. Journal of Procedia Computer Science, 115, 572-579.

Alluwaici, M., Junoh, A.K. and Alazaidah, R., 2020. New problem transformation method based on the local positive pairwise dependencies among labels. Journal of Information and Knowledge Management, 19(01), DOI: 10.1142/S0219649220400171.

Boutell, M.R., Luo, J., Shen, X. and Brown, C.M., 2004. Learning multi-label scene classification. Journal of Pattern Recognition, 37(9), 1757-1771.

Tsoumakas, G. and Katakis, I., 2007. Multi-label classification: An overview. Journal of Data Warehousing and Mining, 3(3), 1-13.

Madjarov, G., Kocev, D., Gjorgjevikj, D. and Džeroski, S., 2012. An extensive experimental comparison of methods for multi-label learning. Journal of Pattern Recognition, 45(9), 3084-3104.

Sangkatip, W. and Phuboon-Ob, J., 2020. Non-communicable diseases classification using multi-label learning techniques. Proceeding of the 5th International Conference on Information Technology (InCIT), Chonburi, Thailand, October 21-22, 2020, pp. 17-21.

Sousa, R. and Gama, J., 2016. Online multi-label classification with adaptive model rules. Proceedings of 17th Conference of the Spanish Association for Artificial Intelligence, Salamanca, Spain, September 14-16, 2016, pp. 58-67.

García, S.M., Mantas, C., Castellano, F. and Abellán, J., 2019. Ensemble of classifier chains and Credal C4.5 for solving multi-label classification. Journal of Progress in Artificial Intelligence, 8, DOI: 10.1007/s13748-018-00171-x.

Zhang, M.L. and Zhou, Z.H., 2007. ML-KNN: A lazy learning approach to multi-label learning. Journal of Pattern Recognition, 40(7), 2038-2048.

Zhang, M.L. and Zhou, Z.H., 2006. Multilabel neural networks with applications to functional genomics and text categorization. Journal of IEEE Transactions on Knowledge and Data Engineering, 18(10), 1338-1351.

Read, J., Pfahringer, B. and Holmes, G., 2008. Multi-label classification using ensembles of pruned sets. Proceedings of the IEEE International Conference on Data Mining, Pisa, Italy, December 15-19, 2008, pp. 995-1000.

Jin, W., Hong, W., Cuiping, X., Weihua, O., Qiaosong, C. and Xin, D., 2017. Ensembles of classifier chains for multi-label classification based on Spark. Journal of University of Science and Technology of China, 47(4), 350-357.

Tsoumakas, G., Katakis, I. and Vlahavas I., 2011. Random k-labelsets for multilabel classification. Journal of IEEE Transactions on Knowledge and Data Engineering, 23(7), 1079-1089.

Zhang, M.-L., 2011. Lift: Multi label learning with label-specific features. Journal of IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1609-1614.

Gao, W., Hu, J., Li, Y. and Zhang, P., 2020. Feature redundancy based on interaction information for multi-label feature selection. Journal of IEEE Access, 8, 146050-146064.

Huang, J., Li, G. and Wu, X., 2018. Joint feature selection and classification for multilabel learning. Journal of IEEE Transactions on Cybernetics, 48, 1-14.

Guozhu, D. and Huan, L., 2018. Feature Engineering for Machine Learning and Data Analytics. New York: CRC Press.

Hafeez, G., Khan, I., Jan, S., Shah, I.A., Khan, F.A. and Derhab, A., 2021. A novel hybrid load forecasting framework with intelligent feature engineering and optimization algorithm in smart grid. Journal of Applied Energy, 299, 117178.

Emmert-Streib, F., Yang, Z., Feng, H., Tripathi, S. and Dehmer, M., 2020. An introductory review of deep learning for prediction models with big data. Journal of Frontiers in Artificial Intelligence, 3, DOI: 10.3389/frai.2020.00004.

Deng, Z., Wang, S. and Chung, F.L., 2013. A minimax probabilistic approach to feature transformation for multi-class data. Journal of Applied Soft Computing, 13(1), 116-127.

Patterson, J. and Gibson, A., 2017. Deep Learning. California: O’Reilly Media.

Cheng, Y., Zhao, D., Wang, Y. and Pei, G., 2019. Multi-label learning with kernel extreme learning machine autoencoder. Journal of Knowledge-Based Systems, 178, 1-10.

Read, J., Puurula, A. and Bifet, A., 2015. Multi-label classification with meta-labels. Proceedings of the IEEE International Conference on Data Mining, Shenzhen, China, December 14-17, 2015, pp. 941-946.

Cherman, E., Monard, M.-C. and Metz, J., 2011. Multi-label problem transformation methods: a case study. CLEI Electronic Journal, 14, DOI: 10.19153/cleiej.14.1.4.

Cortes, C. and Vapnik, V., 1995. Support-vector networks. Journal of Machine Learning, 20(3), 273-297.

Elisseeff, A. and Weston, J., 2001. Kernel methods for multi-labelled classification and categorical regression problems. Advances in Neural Information Processing Systems, 14, 681-687.

Gibaja, E., Moyano, J. and Ventura, S., 2016. An ensemble-based approach for multi-view multi-label classification. Journal of Progress in Artificial Intelligence, 5, DOI: 10.1007/s 13748-016-0098-9.

Rokach, L., Schclar, A. and Itach, E., 2013. Ensemble methods for multi-label classification. Journal of Expert Systems with Applications, 41, DOI: 10.1016/j.eswa.2014.06.015.

Kimura, K., Kudo, M., Sun, L. and Koujaku, S., 2016. Fast random k-labELsets for large-scale multi-label classification. Proceedings of the 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, December 4-8, 2016, pp. 438-443.

Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J. and Vlahavas, I., 2011. MULAN: A Java library for multi-label learning. Journal of Machine Learning Research, 12, 2411-2414.

Liou, C.-Y., Cheng, W.-C., Liou, J.-W. and Liou, D.-R., 2014. Autoencoder for words. Journal of Neurocomputing, 139, 84-96.

Read, J., Pfahringer, B., Holmes, G. and Frank, E., 2011. Classifier chains for multi-label classification. Journal of Machine Learning, 85(3), 333-359.

Chen, W.-J., Shao, Y.-H., Li, C.-N. and Deng, N.-Y., 2016. MLTSVM: a novel twin support vector machine to multi-label learning. Journal of Pattern Recognition, 52, 61-74.

Tsoumakas, G., Katakis, I. and Vlahavas, I., 2011. Random k-labelsets for multilabel Classification. Journal of IEEE Transactions on Knowledge and Data Engineering, 23(7), 1079-1089.

Szymański, P. and Kajdanowicz, T., 2019. Scikit-multilearn: a scikit-based Python environment for performing multi-label classification. Journal of Machine Learning Research, 20, 209-230.

Wu, X.-Z. and Zhou, Z.-H., 2017. A unified view of multi-label performance measures. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, August 6-11, 2017, pp. 3780-3788.