Defense Date
5-10-2017
Graduation Date
Summer 1-1-2017
Availability
Immediate Access
Submission Type
thesis
Degree Name
MS
Department
Computational Mathematics
School
McAnulty College and Graduate School of Liberal Arts
Committee Chair
Frank D'Amico
Committee Member
John Kern
Committee Member
Sean Tierney
Keywords
Dimension Reduction; Partial Least Squares; Penalized Regression; Predictive Modeling; Regression; Selective Inference
Abstract
Several problems arise when attempting to use traditional predictive modeling techniques on ‘big data.’ For instance, multiple linear regression models cannot be used on datasets with hundreds of variables. However several techniques are becoming common tools for selective inference as the need for analyzing big data increases. Forward selection and penalized regression models (such as LASSO, Ridge Regression, and Elastic Net) are simple modifications of multiple linear regression that can provide some guidance on simplifying a model through variable selection. Dimension reducing techniques, such as Partial Least Squares and Principal Components Analysis, are more complex than regression but have the ability to handle highly correlated independent variables. Each of the aforementioned techniques are valuable in predictive modeling if used properly. This paper provides a mathematical introduction to these developments in selective inference. A sample dataset is used to demonstrate modeling and interpretation. Further, the applications to big data, as well as advantages and disadvantages of each procedure, are discussed.
Format
Language
English
Recommended Citation
Papke, S. (2017). A Review of 'Big Data' Variable Selection Procedures For Use in Predictive Modeling (Master's thesis, Duquesne University). Retrieved from https://dsc.duq.edu/etd/182