Enhancing small dataset prediction of silver nanoparticle size with deep learning and Latin hypercube sampling framework

Main Article Content

Chinakrit Akkawong
Tanawadee Dechakupt
Kulpavee Jitapunkul
Chanin Panjapornpon

Abstract

Laboratory experiments often face challenges such as inherent complexities, difficulties in data gathering, high costs, and time-consuming procedures. These constraints typically result in a limited amount of experimental data, leading to calculation issues such as overfitting and underfitting. To address these issues, this study applied an integrated framework of deep learning combined with Latin hypercube sampling (LHS) to enhance prediction models based on small datasets. A case study on size prediction in silver nanoparticle synthesis was used to demonstrate the performance of the developed framework. The LHS technique augments the amount of raw data for model development. Consequently, the original raw data and the data generated from LHS were integrated as training data for the development of a deep learning prediction model. This integrated model improved prediction performance, validated by the validation and test dataset R2 values, which are 0.924 and 0.918, respectively. Additionally, the accuracy of unseen data test results was significantly higher when compared to a model trained on a small dataset, with the value rising from 0.442 to 0.893. The proposed framework enables high-accuracy predictions of silver nanoparticle size using small experimental datasets and other conditions within specified boundaries.

Downloads

Article Details

How to Cite
Akkawong, C., Dechakupt, T., Jitapunkul, K., & Panjapornpon, C. (2024). Enhancing small dataset prediction of silver nanoparticle size with deep learning and Latin hypercube sampling framework. Science, Engineering and Health Studies, 18, 24020012. https://doi.org/10.69598/sehs.18.24020012
Section
Physical sciences

References

Austin, P. C., Harrell, F. E. Jr., and Steyerberg, E. W. (2021). Predictive performance of machine and statistical learning methods: Impact of data-generating processes on external validity in the "large N, small p" setting. Statistical Methods in Medical Research, 30(6), 1465–1483.

Berndt, A. E. (2020). Sampling methods. Journal of Human Lactation, 36(2), 224–226.

Brigato, L., and Iocchi, L. (2021). A close look at deep learning with small data. In Proceeding of the 25th International Conference on Pattern Recognition (ICPR), pp. 2490–2497. Milan, Italy.

Chicco, D., Warrens, M. J., and Jurman, G. (2021). The coefficient of determination r-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, e623.

Cioppa, T. M., and Lucas, T. W. (2007). Efficient nearly orthogonal and space-filling latin hypercubes. Technometrics, 49(1), 45–55.

Dhiman, P., Ma, J., Qi, C., Bullock, G., Sergeant, J. C., Riley, R. D., and Collins, G. S. (2023). Sample size requirements are not being considered in studies developing prediction models for binary outcomes: A systematic review. BMC Medical Research Methodology, 23(1), 188.

Etikan, I., and Bala, K. (2017). Sampling and sampling methods. Biometrics & Biostatistics International Journal, 5(6), 00149.

Falk, A., and Heckman, J. J. (2009). Lab experiments are a major source of knowledge in the social sciences. Science, 326(5952), 535–538.

McKay, M. D., Beckman, R. J., and Conover, W. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2), 239–245.

Pothuganti, S. (2018). Review on over-fitting and under-fitting problems in machine learning and solutions. International Journal of Advanced Research in Electrical Electronics and Instrumentation Engineering, 7(9), 3692–3695.

Shafaei, A., and Khayati. G. R. (2020). A predictive model on size of silver nanoparticles prepared by green synthesis method using hybrid artificial neural network-particle swarm optimization algorithm. Measurement, 151, 107199.

Sharma, N., Sharma, R., and Jindal, N. (2021). Machine learning and deep learning applications-a vision. Global Transitions Proceedings, 2(1), 24–28.

Shields, M. D., and Zhang, J. (2016). The generalization of Latin hypercube sampling. Reliability Engineering & System Safety, 148, 96–108.

Subramanian, J., and Simon, R. (2013). Overfitting in prediction models – is it a problem only in high dimensions? Contemporary Clinical Trials, 36(2), 636–641.