Enhancing small dataset prediction of silver nanoparticle size with  deep learning and Latin hypercube sampling framework

Chinakrit Akkawong; Tanawadee Dechakupt; Kulpavee Jitapunkul; Chanin Panjapornpon

doi:10.69598/sehs.18.24020012

PDF

Published: Dec 31, 2024

DOI: https://doi.org/10.69598/sehs.18.24020012

Keywords:

Latin hypercube sampling limited data deep learning artificial intelligence

Chinakrit Akkawong

Department of Chemical Engineering, Center of Excellence on Petrochemicals and Materials Technology, Faculty of Engineering, Kasetsart University, Bangkok, Thailand

Tanawadee Dechakupt

Department of Industrial Chemistry, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand

Kulpavee Jitapunkul

Department of Chemical Engineering, Center of Excellence on Petrochemicals and Materials Technology, Faculty of Engineering, Kasetsart University, Bangkok, Thailand

Chanin Panjapornpon

Department of Chemical Engineering, Center of Excellence on Petrochemicals and Materials Technology, Faculty of Engineering, Kasetsart University, Bangkok, Thailand. Corresponding author's e-mail: fengcnp@ku.ac.th

Abstract

Laboratory experiments often face challenges such as inherent complexities, difficulties in data gathering, high costs, and time-consuming procedures. These constraints typically result in a limited amount of experimental data, leading to calculation issues such as overfitting and underfitting. To address these issues, this study applied an integrated framework of deep learning combined with Latin hypercube sampling (LHS) to enhance prediction models based on small datasets. A case study on size prediction in silver nanoparticle synthesis was used to demonstrate the performance of the developed framework. The LHS technique augments the amount of raw data for model development. Consequently, the original raw data and the data generated from LHS were integrated as training data for the development of a deep learning prediction model. This integrated model improved prediction performance, validated by the validation and test dataset R² values, which are 0.924 and 0.918, respectively. Additionally, the accuracy of unseen data test results was significantly higher when compared to a model trained on a small dataset, with the value rising from 0.442 to 0.893. The proposed framework enables high-accuracy predictions of silver nanoparticle size using small experimental datasets and other conditions within specified boundaries.

Downloads

Download data is not yet available.

How to Cite

Akkawong, C., Dechakupt, T., Jitapunkul, K., & Panjapornpon, C. (2024). Enhancing small dataset prediction of silver nanoparticle size with deep learning and Latin hypercube sampling framework. Science, Engineering and Health Studies, 18, 24020012. https://doi.org/10.69598/sehs.18.24020012

Issue

Volume 18, 2024

Section

Physical sciences

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

Austin, P. C., Harrell, F. E. Jr., and Steyerberg, E. W. (2021). Predictive performance of machine and statistical learning methods: Impact of data-generating processes on external validity in the "large N, small p" setting. Statistical Methods in Medical Research, 30(6), 1465–1483.

Berndt, A. E. (2020). Sampling methods. Journal of Human Lactation, 36(2), 224–226.

Brigato, L., and Iocchi, L. (2021). A close look at deep learning with small data. In Proceeding of the 25th International Conference on Pattern Recognition (ICPR), pp. 2490–2497. Milan, Italy.

Chicco, D., Warrens, M. J., and Jurman, G. (2021). The coefficient of determination r-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, e623.

Cioppa, T. M., and Lucas, T. W. (2007). Efficient nearly orthogonal and space-filling latin hypercubes. Technometrics, 49(1), 45–55.

Dhiman, P., Ma, J., Qi, C., Bullock, G., Sergeant, J. C., Riley, R. D., and Collins, G. S. (2023). Sample size requirements are not being considered in studies developing prediction models for binary outcomes: A systematic review. BMC Medical Research Methodology, 23(1), 188.

Etikan, I., and Bala, K. (2017). Sampling and sampling methods. Biometrics & Biostatistics International Journal, 5(6), 00149.

Falk, A., and Heckman, J. J. (2009). Lab experiments are a major source of knowledge in the social sciences. Science, 326(5952), 535–538.

McKay, M. D., Beckman, R. J., and Conover, W. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2), 239–245.

Pothuganti, S. (2018). Review on over-fitting and under-fitting problems in machine learning and solutions. International Journal of Advanced Research in Electrical Electronics and Instrumentation Engineering, 7(9), 3692–3695.

Shafaei, A., and Khayati. G. R. (2020). A predictive model on size of silver nanoparticles prepared by green synthesis method using hybrid artificial neural network-particle swarm optimization algorithm. Measurement, 151, 107199.

Sharma, N., Sharma, R., and Jindal, N. (2021). Machine learning and deep learning applications-a vision. Global Transitions Proceedings, 2(1), 24–28.

Shields, M. D., and Zhang, J. (2016). The generalization of Latin hypercube sampling. Reliability Engineering & System Safety, 148, 96–108.

Subramanian, J., and Simon, R. (2013). Overfitting in prediction models – is it a problem only in high dimensions? Contemporary Clinical Trials, 36(2), 636–641.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References