[Home ] [Archive]   [ فارسی ]  
:: Main :: About :: Current Issue :: Archive :: Search :: Submit :: Contact ::
Main Menu
Home::
Journal Information::
Articles archive::
For Authors::
For Reviewers::
Registration::
Contact us::
Site Facilities::
::
Search in website

Advanced Search
..
Receive site information
Enter your Email in the following box to receive the site news and information.
..
:: Volume 25, Issue 1 (1-2021) ::
Andishe 2021, 25(1): 69-90 Back to browse issues page
Regression Analysis Methods for High-dimensional Data
Monireh Maanavi , Mahdi Roozbeh *
Semnan University
Abstract:   (2117 Views)

By evolving science, knowledge, and technology, new and precise methods for measuring, collecting, and recording information have been innovated, which have resulted in the appearance and development of high-dimensional data. The high-dimensional data set, i.e., a data set in which the number of explanatory variables is much larger than the number of observations, cannot be easily analyzed by traditional and classical methods, the same as the ordinary least-squares method, and its interpretability will be very complex. Although in classical regression analysis, the ordinary least-squares estimation is the best estimation method if the essential assumptions are met, it is not applicable for high-dimensional data, and in this condition, we need to apply the modern methods. In this research, it is first mentioned the drawbacks of classical methods in the analysis of high-dimensional data and then, it is proceeded to introduce and explain the modern and common approaches of the regression analysis for high-dimensional data same as principal component analysis and penalized methods. Finally, a simulation study and real-world data analysis are performed to apply and compare the mentioned methods in high-dimensional data.

Keywords: High-dimensional data set, Penalized least-squares method, Principle component Analysis.
Full-Text [PDF 1294 kb]   (2368 Downloads)    
Type of Study: Research | Subject: Special
Received: 2020/07/21 | Accepted: 2021/01/20 | Published: 2021/01/29
References
1. Akdeniz, F and Roozbeh, M. (2019). Generalized difference-based weighted mixed almost unbiased ridge estimator in partially linear models, Statistical Papers, 60(5), 1717–1739.
2. Bellman, R. (1961). Adaptive control processes. Princeton university press, London.
3. Bertsimas, D and Parys, B. V. (2020). Sparse high-dimensional regression: exact scalable algorithms and phase transitions, Biostatistics, 21(2), 219-235.
4. Bonddel, H.D. and Reich, B.J. (2008). Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR, journal of the interntional biometric society. 64(1), 115-123.
5. Breiman L. (1995). Better subset regression using the nonnegative garrote, Technometrics. 37(4), 373-384.
6. Efron, B. and Hastie,T. (2017). Computer age statistical inference. Cambridge University Press, Cambridge.
7. Everitt, B. and Hothorn, T. (2011). An introduction to applied multivariate analysis with R. Springer, New York Dordrecht, Heidelberg, London.
8. Fan, J. and LI, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association. 96(456), 1348-1360.
9. Fan, J. (1997). Comments on wavelets in statistics: a review by A. Antoniadis,” Journal of the Italian Statistical Society. 6(2), 131-138.
10. Frank, E. and Friedman, J. (1993). A statistical view of some chemo- metrics regression tools, Technometrics. 35(3), 109-148.
11. Hastie, T. J. and Pregibon, (1992). Generalized linear models. Eberly College of Science, London.
12. Hoerl, A. E. (1962). Application of ridge analysis to regression problems, Chemical Engineering Progress. 58(1), 54-59.
13. Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: applications to nonorthogonal problems, Technometrics. 12(1), 69-82.
14. Jolliffe, I.T. (2002). Principal component analysis. Springer series in statistics, Aberdeen.
15. Li, B. and Yu, Q. (2009). Robust and sparse bridge regression, Statistics and its interface, 2(4), 481-491.
16. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space, Philosophical Magazine. 2(1), 559-572.
17. Roozbeh, M. (2018). Optimal QR-based estimation in partially linear regression models with correlated errors using GCV criterion, Computational Statistics & Data Analysis. 117, 45-61.
18. Roozbeh, M., Babaie-Kafaki, S. and Naeimi Sadigh, A. (2018). A heuristic approach to combat multicollinearity in least trimmed squares regression analysis, Mathematical modelling. 57(2), 105-120.
19. Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society. Series B (Methodological). 58(1), 267-288.
20. Walker, D. A. and Smith, T. J. (2020). Logistic regression under sparse data conditions, Journal of Modern Applied Statistical Methods, 18(2), 33-72.
21. Wasserman, L. (2006). All of nonparametric statistics. Springer Science and Business Media, New York.
22. Watkins, D.S. (2002) Fundamentals of matrix coputations. John Wiley and Sons, New York.
23. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society, Series B. 67(2), 301-320.
Send email to the article author

Add your comments about this article
Your username or Email:

CAPTCHA


XML   Persian Abstract   Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Maanavi M, Roozbeh M. Regression Analysis Methods for High-dimensional Data. Andishe 2021; 25 (1) :69-90
URL: http://andisheyeamari.irstat.ir/article-1-814-en.html


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Volume 25, Issue 1 (1-2021) Back to browse issues page
مجله اندیشه آماری Andishe _ye Amari
Persian site map - English site map - Created in 0.06 seconds with 37 queries by YEKTAWEB 4645