This function takes a linear model and selects the subset of predictor variables that meet a user-specific collinearity threshold measured by the (generalized) variance-inflation factor (VIF).
stepVIF(model, threshold = 10, verbose = FALSE)Linear model (object of class 'lm') containing collinear predictor variables.
Positive number defining the maximum allowed VIF. Defaults to threshold = 10.
Logical indicating if iteration results should be printed. Defaults to verbose = FALSE.
A linear model (object of class ‘lm’) with low collinearity.
stepVIF starts computing the VIF of all predictor variables in the linear model. If the linear model
contains categorical predictor variables, generalized variance-inflation factors, GVIF, (Fox and Monette,
1992) are calculated instead using vif. GVIF is interpretable as the inflation in size
of the confidence ellipse or ellipsoid for the coefficients of the predictor variable in comparison with
what would be obtained for orthogonal, uncorrelated data. Since categorical predictors have more than one
degree of freedom (df), the confidence ellipsoid will have df dimensions, and GVIF will need to be
adjusted so that it can be comparable across predictor variables. The adjustment is made using the
following equation:
\(GVIF^{1/(2\times df)}\)
The next step consists of evaluating if any of the predictor variables has a (G)VIF larger than the
specified threshold, the function default being threshold = 10. For, GVIF^(1/(2*df)), the threshold will
be sqrt(threshold).
If there is only one predictor variable that does not meet the VIF threshold, it is automatically removed
from the model and no further processing occurs. When there are two or more predictor variables that do not
meet the (G)VIF threshold, stepVIF fits a linear model between each of them and the dependent variable.
The predictor variable with the lowest adjusted coefficient of determination is dropped from the model and
new coefficients are calculated, resulting in a new linear model.
This process lasts until all predictor variables included in the new model meet the (G)VIF threshold.
Nothing is done if all predictor variables have a (G)VIF value lower that the threshold, and stepVIF returns
the original linear model.
Fox, J. and Monette, G. (1992) Generalized collinearity diagnostics. JASA, 87, 178--183.
Fox, J. (2008) Applied Regression Analysis and Generalized Linear Models, Second Edition. Sage.
Fox, J. and Weisberg, S. (2011) An R Companion to Applied Regression, Second Edition. Thousand Oaks: Sage.
Hair, J. F., Black, B., Babin, B. and Anderson, R. E. (2010) Multivariate data analysis. New Jersey: Pearson Prentice Hall.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
# NOT RUN {
require(car)
fit <- lm(prestige ~ income + education + type, data = Duncan)
fit <- stepVIF(fit, threshold = 10, verbose = TRUE)
# }
Run the code above in your browser using DataLab