Optimization of information gain interval for determining artificial ripeness of banana using image data with imbalanced class
Keywords:
Banana artificial ripeness, k-Means, Optimized IG, Scott rule, Sturgess ruleAbstract
Importance of the work: Precise determination of ripeness is crucial for post-harvest processing of fruits. Moreover, distinguishing between natural and artificial ripeness of banana fruits requires a specific feature because they have similar physical appearance.
Objectives: To optimize the information gain (IG) interval to obtain the optimal features in identifying artificial banana ripeness with imbalanced class data.
Materials & Methods: The test was based on six Indonesian banana cultivars using 11,593 images. In total, 78 features were extracted using morphological descriptors, a convex hull, a local binary pattern and a gray-level co-occurrence matrix. IG optimization was based on the Sturgess rule, Scott rule and K-means clustering. Oversampling based on the synthetic minority oversampling technique (SMOTE) was used to handle imbalanced data.
Results: The results of the identification using extreme learning machine classification of imbalanced data showed higher accuracy based on IG optimization using the Sturgess and Scott rules rather than the use of IG. The implementation of SMOTE also substantially increased the accuracy from 20% to 40% compared to the result with imbalanced data. Most of the accuracy (80%) resulted from using selected features for four cultivars (ambon lumut, hijau, kepok and raja). The two other banana cultivars (morosebo and susu) had accuracy levels of more than 71% and 76%, respectively.
Main finding: Due to the complexity of choosing the optimum number of IG bin intervals for data with very high similarity characteristics, optimization using Sturgess and Scott rules produced greater accuracy, especially with imbalanced data.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Kasetsart Universityonline 2452-316X print 2468-1458/Copyright © 2022. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/),
production and hosting by Kasetsart University of Research and Development Institute on behalf of Kasetsart University.