A COMPARISON OF BOOTSTRAP METHODS FOR ADAPTIVE LASSO + PARTIAL RIDGE TO CONSTRUCT CONFIDENCE INTERVALS FOR PARAMETERS IN HIGH-DIMENSIONAL SPARSE LINEAR MODELS

Main Article Content

Parit Chancherngpanich
Vitara Pungpapong

Abstract

This research is aimed to propose a method, called bootstrap adaptive lasso + partial ridge (ALPR), to construct confidence intervals of regression coefficients in high – dimensional data and compare its performance with bootstrap lasso + partial ridge (LPR). The ALPR is a two-stage estimator. The adaptive lasso is used to select variables and the partial ridge is used to refit the coefficients. Here we perform two techniques of bootstrap which are residual bootstrap and paired bootstrap. We also consider two cases of coefficients which are weak sparsity and hard sparsity where weak sparsity and hard sparsity refer to the case that majority of coefficients have value closed to zero and equal to zero respectively. Simulation studies in 8 cases of high – dimensional covariates that are generated from multivariate normal distribution with different types of covariance matrix. Mean interval lengths and coverage probabilities are used to measure and compare performance of bootstrap methods. Our simulation studies show that the residual bootstrap adaptive lasso + partial ridge provides lowest mean interval lengths for most cases. However, it is not obvious that which bootstrap method is the best in terms of providing highest coverage probabilities. We also apply each bootstrap method with the real data, colon cancer microarray data set. The results show that the residual bootstrap adaptive lasso + partial ridge and the paired bootstrap adaptive lasso + partial ridge are the best method in terms of mean interval lengths and coverage probabilities, respectively.

Article Details

How to Cite
Chancherngpanich, P., & Pungpapong, V. (2023). A COMPARISON OF BOOTSTRAP METHODS FOR ADAPTIVE LASSO + PARTIAL RIDGE TO CONSTRUCT CONFIDENCE INTERVALS FOR PARAMETERS IN HIGH-DIMENSIONAL SPARSE LINEAR MODELS. Thai Journal of Science and Technology, 10(5), 500–513. https://doi.org/10.14456/tjst.2021.40
Section
วิทยาศาสตร์กายภาพ

References

Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America, 96(12), 6745–6750.

Chatterjee, A., & Lahiri, S. N. (2011). Bootstrapping lasso estimators. Journal of the American Statistical Association, 106, 608–625.

Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Journal of the American Statistical Association, 65, 55–67.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R (2nd ed.). Springer.

Knight, K., & Fu, W. J. (2000). Asymptotics for lasso-type estimators. The Annals of Statistics, 28, 1356–1378.

Liu, H., & Yu, B. (2013). Asymptotic properties of lasso + mLS and lasso + Ridge in sparse high-dimensional linear regression. Electronic Journal of Statistics, 7, 3124–3169.

Liu, H., Xu, X., & Li, J. J. (2020). A bootstrap lasso + partial ridge method to construct confidence intervals for parameters in high-dimensional sparse linear models. Statistica Sinica, 30(3), 1333–1355.

Pungpapong, V. (2015). A brief review on high-dimensional linear regression. Thai Science and Technology Journal, 23(2), 212-223. (in Thai)

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58, 267–288.

Tibshirani, R. J. (2013). The lasso problem and uniqueness. Electronic Journal of Statistics, 7, 1456–1490.

Wasserman, L., & Roeder, K. (2009). Weak signal identification and inference in penalized model selection. The Annals of Statistics, 45, 1214–1253.

Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.