Optimization of information gain interval for determining artificial ripeness of banana using image data with imbalanced class

Authors

  • Candra Dewi Biology Department, Faculty of Science, Universitas Brawijaya, Malang 64145, Indonesia. Informatics Department, Faculty of Computer Science, Universitas Brawijaya, Malang 64145, Indonesia.
  • Endang Arisoesilaningsih Biology Department, Faculty of Science, Universitas Brawijaya, Malang 64145, Indonesia
  • Wayan Firdaus Mahmudy Informatics Department, Faculty of Computer Science, Universitas Brawijaya, Malang 64145, Indonesia
  • Solimun Solimun Statistics Department, Faculty of Science, Universitas Brawijaya, Malang 64145, Indonesia

Keywords:

Banana artificial ripeness, k-Means, Optimized IG, Scott rule, Sturgess rule

Abstract

Importance of the work: Precise determination of ripeness is crucial for post-harvest processing of fruits. Moreover, distinguishing between natural and artificial ripeness of banana fruits requires a specific feature because they have similar physical appearance.
Objectives: To optimize the information gain (IG) interval to obtain the optimal features in identifying artificial banana ripeness with imbalanced class data.
Materials & Methods: The test was based on six Indonesian banana cultivars using 11,593 images. In total, 78 features were extracted using morphological descriptors, a convex hull, a local binary pattern and a gray-level co-occurrence matrix. IG optimization was based on the Sturgess rule, Scott rule and K-means clustering. Oversampling based on the synthetic minority oversampling technique (SMOTE) was used to handle imbalanced data.
Results: The results of the identification using extreme learning machine classification of imbalanced data showed higher accuracy based on IG optimization using the Sturgess and Scott rules rather than the use of IG. The implementation of SMOTE also substantially increased the accuracy from 20% to 40% compared to the result with imbalanced data. Most of the accuracy (80%) resulted from using selected features for four cultivars (ambon lumut, hijau, kepok and raja). The two other banana cultivars (morosebo and susu) had accuracy levels of more than 71% and 76%, respectively.
Main finding: Due to the complexity of choosing the optimum number of IG bin intervals for data with very high similarity characteristics, optimization using Sturgess and Scott rules produced greater accuracy, especially with imbalanced data.

Downloads

Published

2023-08-31

How to Cite

Dewi, Candra, Endang Arisoesilaningsih, Wayan Firdaus Mahmudy, and Solimun Solimun. 2023. “Optimization of Information Gain Interval for Determining Artificial Ripeness of Banana Using Image Data With Imbalanced Class”. Agriculture and Natural Resources 57 (4). Bangkok, Thailand:615–624. https://li01.tci-thaijo.org/index.php/anres/article/view/260450.

Issue

Section

Research Article