การเปรียบเทียบวิธีการประมาณค่าพารามิเตอร์ของการวิเคราะห์การถดถอยที่ปรับด้วยฟังก์ชันการลงโทษภายใต้ข้อมูลที่มีมิติสูง

Main Article Content

เบญจมาส รุ่งศรานนท์
อัชฌา อระวีพร

Abstract

The objective of this research is to compare the efficiency of coefficient parameter estimation by using penalized regression analysis on five methods namely the ridge regression, the lasso regression, the elastic net regression, the adaptive lasso regression, and the adaptive elastic net regression methods. This research uses the multiple linear regression model, which is consisted of a dependent variable and independent variables.  In case the number of independent variables is larger than number of sample sizes called high-dimensional data.  For comparison the efficiency of five methods, the criterion is based on the average mean square errors. The data of this research is simulated by the small sample sizes (  = 5, 10, and 15) when the number of independent variables is specified by 16. For medium sample sizes (  = 20, 30, and 40), the number of independent variables is specified by 50. For large sample sizes (  = 60, 70, and 80), the number of independent variables is defined 100. The independent variable distribution is generated from the normal distribution, and the residuals are generated from the normal distribution, contaminated normal distribution, and Weibull distribution The data are obtained through simulation using a Monte Carlo technique with 1,000 replications for each case. The results are found that the adaptive elastic net regression is the minimum average mean square error in all cases. Furthermore, we apply five methods for real data based on the small sample sizes when the number of independent variables is considered on 16.  The results of real data show that the adaptive elastic net regression outperforms the other methods as the simulation data.

Article Details

Section
Physical Sciences
Author Biographies

เบญจมาส รุ่งศรานนท์, คณะวิทยาศาสตร์ สถาบันเทคโนโลยีพระจอมเกล้าเจ้าคุณทหารลาดกระบัง

ภาควิชาสถิติ คณะวิทยาศาสตร์ สถาบันเทคโนโลยีพระจอมเกล้าเจ้าคุณทหารลาดกระบัง ถนนฉลองกรุง เขตลาดกระบัง กรุงเทพมหานคร 10520

อัชฌา อระวีพร

ภาควิชาสถิติ คณะวิทยาศาสตร์ สถาบันเทคโนโลยีพระจอมเกล้าเจ้าคุณทหารลาดกระบัง ถนนฉลองกรุง เขตลาดกระบัง กรุงเทพมหานคร 10520

References

Hoerl, A.E. and Kennard, R.W., 1970, Ridge regression: Biased estimation for nonorthogonal problems, J. Am. Stat. Assoc. 12: 55-67.

Tibshirani, R., 1996, Regression shrinkage and selection via the lasso, J. Royal Stat. Soc. B. 58: 267-288.

Zou, H. and Hastie, T., 2005, Regularization and variable selection via the elastic net, J. Royal Stat. Soc. B 67: 301-320.

Zou, H., 2006, The adaptive lasso and its oracle properties, J. Am. Stat. Assoc. 101: 1418-1429.

Zou, H. and Zhang, T., 2009, On the adaptive elastic net with a diverging number of parameters, Ann. Stat. 37: 1733-1751.

Phakdee, N., 2009, Comparisons of Estimation of Multiple Regression Coefficients with Existent Multicollinearity among Independent Variables by Ridge Regression Method, Master Thesis, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, 88 p. (in Thai)

Choosawat, C. and Lisawadi, S., 2018, Performance comparison of ridge regression, LASSO and adaptive LASSO in poisson regression under high-dimensional sparse data with multicollinearity, pp. 305-314, 19th National Graduate Research Conference, Khon Kaen University, Khon Kaen. (in Thai)

Algamal, Z.Y. and Lee, M.H., 2015, Regu larized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification, Comput. Biol. Med. 67: 136-145.

Sinsomboonthong, S., 2017, Regression Analysis, Jamjuree Product, Bangkok, 494 p. (in Thai)

Boonstra, P.S., Mukherjee, B. and Taylar, J.M., 2015, A small-sample choice of the tuning parameter in ridge regression, Stat. Sin. 25: 1185-1206.

Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R., 2004, Least angle regression, Ann. Stat. 32: 407-499.

Hastie, T., Tibshirani, R. and Friedman, J., 2009, The Elements of Statistical Learning: Data Mining Inference and Prediction, 2nd Ed., Springer, California, 527 p.

Zou, H., Hastie, T. and Tibshirani, R., 2007, On the degrees of freedom of lasso, Ann. Stat. 35: 2173-2192.

Phuenaree, B., 2007, An Estimation of Variance Components for Randomized Complete Block design by Bootstrap Method, Master Thesis, Chulalongkorn University, Bangkok, 249 p. (in Thai)

Rafiei, M.H. and Adeli, H., Residential Building Data Set, Available Source: https://archive.ics.uci.edu/ml/datasets/Residential+Building+Data+Set#, February 19, 2018.

Thongteeraparp, W., 1994, Development of a Statistical Package for Ridge Regression Analysis, Master Thesis, Kasetsart University, Bangkok, 171 p. (in Thai)