Fast synthesis of the minority class using generative adversarial networks for imbalanced data classification problems

Main Article Content

Walisa Romsaiyud

Abstract

Generative Adversarial Networks (GANs) are a class of deep neural networks that can be used to generate data examples in imbalanced data situations. GANs consist of two simultaneously trained modes: generative and discriminative modeling. The generative model generates new data as random noise from the training dataset, and the discriminator model distinguishes examples from generated new data and real data. We study the overlapping data transfer during a generating model in distributed real-time data streaming. This paper proposed a new extension method on GANs called GANs2T based on time series function and tabular data to improve the model and run time performance. We use this technique to capture the covariance structure of the minority class and to generate synthetic samples along the probability contours for learning algorithms on streaming data. The experimental testing is performed on binary-class and multi-class imbalanced learning methods from several benchmark datasets. The results validate GANs2T with the XGBoost algorithm for the overall accuracy = 84.93% and average training time(s) = 60.20.

Article Details

Section
Original Articles

References

พุทธิพร ธนธรรมเมธี และเยาวเรศ ศิริสถิตย์กุล. (2561). เทคนิคการจําแนกข้อมูลที่พัฒนาสําหรับชุดข้อมูลที่ไม่สมดุลของภาวะข้อเข่าเสื่อมในผู้สูงอายุ. วารสารวิทยาศาสตร์และเทคโนโลยี, 27(6), 1164-1178.

ภิรมย์ คงเลิศ (2565). หน่วยที่ 9 การเรียนรู้เชิงลึก. ใน ประมวลสาระชุดวิชาปัญญาประดิษฐ์และการประยุกต์ หน่วยที่ 6-10 (น. 9-1 – 9-104). นนทบุรี: สาขาวิชาวิทยาศาสตร์และเทคโนโลยี มหาวิทยาลัยสุโขทัยธรรมาธิราช.

วิทยา ปัญญา และ วฤษาย์ ร่มสายหยุด. (2565). วิธีการสร้างแบบจำลองเชิงทำนายพฤติกรรมการผิดเงื่อนไขการปล่อยชั่วคราวของศาลจากชุดข้อมูลที่ไม่สมดุลโดยใช้เทคนิคการเรียนรู้ของเครื่อง. วารสารวิทยาศาสตร์และเทคโนโลยี มหาวิทยาลัยมหาสารคาม, 42(2), 1686-9664.

Alberto, F., Salvador, G., Mikel, G., Ronaldo, C. P., Bartosz, K., & Francisco, H. (2018). Learning from Imbalanced Data Sets. Springer. https://doi.org/10.1007/978-3-319-98074-4, p 385.

Bernardo, A., & Valle, E. D. (2020). VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams. Data Mining and Knowledge Discovery, 35, 2679-2713. https://doi.org/10.1007/s10618-021-00786-0.

Bao, Y. & Yang, S. (2023). Two Novel SMOTE Methods for Solving Imbalanced Classification Problems. in IEEE Access, vol. 11, pp. 5816-5823, 2023, doi: 10.1109/ACCESS.2023.3236794.

Brophy, E., Wang, Z., She, Q., & Ward, T. (2023). Generative adversarial networks in time series: A systematic literature review. ACM Computing Surveys, 55(10), 1-31. https://doi.org/10.1145/3559540.

Brownlee, J. (2019). Generative adversarial networks with python. https://www.scribd.com/document/473922459/Jason-Brownlee-Generative-AdversarialNetworks-with-Python-2020-pdf

Chawla, N., Bowyer, K., Hall, L. & Kegelmeyer, W. (2002). SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research, 16, pp.321–357.

Goodfellow, L., Pougel-Abadie, J., Mirza,M., Bing X., Warde-Farley, D., Ozair, S., Courville, A. & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139-144. https://doi.org/10.1145/3422622.

He, H., & Ma, Y. (2013). Imbalanced Learning. John Wiley & Sons, Inc., p 216.

Jakub, L., & Vladimir, B. (2019). GANs in Action. Manning, pp. 241.

Japkowicz, N. & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5), pp. 203-231.

Jonathan, B., Putra, P. H. & Ruldeviyani, Y. (2020). Observation Imbalanced Data Text to Predict Users Selling Products on Female Daily with SMOTE, Tomek, and SMOTE-Tomek. 2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), Bali, Indonesia, 2020, pp. 81-85, doi: 10.1109/IAICT50021.2020.9172033.

KEEL-dataset, Imbalanced data sets. (2023). Retrieved April 2, 2023, from https://sci2s.ugr.es/keel/imbalanced.php.

Li, X., Metsis, V., Wang, H., Hee, A., & Ngu, H. (2022). TTS-GAN: A transformer-based time-series generative adversarial network. AIME 2022. Springer.

Maniyar, H., Budihal, S. V. & Siddamal, S. V. (2022). Persons facial image synthesis from audio with Generative Adversarial Networks. ECTI-CIT Transactions, vol. 16, no. 2, pp. 135–141, May 2022.

Maureen Lyndel C. Lauron & Jaderick P. Pabico. (2016). Improved Sampling Techniques for Learning an Imbalanced Data Set. ArXiv abs/1601.04756 (2016), pp.1-7.

Theobald, O. (2021). Machine learning for absolute beginners: A plain English introduction (3rd ed). Independently published.

Sridhar, S. & Sanagavarapu, S. (2021) Handling Data Imbalance in Predictive Maintenance for Machines using SMOTE-based Oversampling. 13th International Conference on Computational Intelligence and Communication Networks (CICN), Lima, Peru, 2021, pp. 44-49, doi: 10.1109/CICN51697.2021.9574668.

Strelcenia, E. & Prakoonwit, S. (2022). Comparative Analysis of Machine Learning Algorithms using GANs through Credit Card Fraud Detection. 2022 International Conference on Computing, Networking, Telecommunications & Engineering Sciences Applications (CoNTESA), Skopje, North Macedonia, 2022, pp. 1-5, doi: 10.1109/CoNTESA57046.2022.10011268.

Weiss, G. M. (2013). Foundations of imbalanced learning, imbalanced learning: Foundations, algorithms, and applications. John Wiley & Sons.