areg
and its
use of regression splines) to
determine how well each variable can be predicted from the remaining
variables. Variables are dropped in a stepwise fashion, removing the
most predictable variable at each step. The remaining variables are used
to predict. The process continues until no variable still in the list
of predictors can be predicted with an $R^2$ or adjusted $R^2$
of at least r2
or until dropping the variable with the highest
$R^2$ (adjusted or ordinary) would cause a variable that was dropped
earlier to no longer be predicted at least at the r2
level from
the now smaller list of predictors.
redun(formula, data=NULL, subset=NULL, r2 = 0.9, type = c("ordinary", "adjusted"), nk = 3, tlinear = TRUE, allcat=FALSE, minfreq=0, iterms=FALSE, pc=FALSE, pr = FALSE, ...)
"print"(x, digits=3, long=TRUE, ...)
I()
to force
linearity."adjusted"
to use adjusted $R^2$nk=0
to force linearity for all variables.FALSE
to allow a variable to be automatically
nonlinearly transformed (see areg
) while being predicted. By
default, only continuous variables on the right hand side (i.e., while
they are being predictors) are automatically transformed, using
regression splines. Estimating transformations for target (dependent)
variables causes more overfitting than doing so for predictors.TRUE
to ensure that all categories of
categorical variables having more than two categories are redundant
(see details below)minfreq
observations or
the variable will be dropped and not checked for redundancy against
other variables. minfreq
also specifies the minimum
frequency of a category or its complement
before that category is considered when allcat=TRUE
.TRUE
to consider derived terms (dummy
variables and nonlinear spline components) as separate variables.
This will perform a redundancy analysis on pieces of the variables.iterms=TRUE
you can set pc
to TRUE
to replace the submatrix of terms corresponding to each variable
with the orthogonal principal components before doing the redundancy
analysis. The components are based on the correlation matrix.TRUE
to monitor progress of the stepwise algorithmdataframeReduce
to remove
"difficult" variables from data
if formula
is
~.
to use all variables in data
(data
must be
specified when these arguments are used). Ignored for print
.redun
FALSE
to prevent the print
method
from printing the $R^2$ history and the original $R^2$ with
which each variable can be predicted from ALL other variables."redun"
city
might be
declared redundant even though tied cities might be deemed non-redundant
in another setting. To ensure that all categories may be predicted well
from other variables, use the allcat
option. To ignore
categories that are too infrequent or too frequent, set minfreq
to a nonzero integer. When the number of observations in the category
is below this number or the number of observations not in the category
is below this number, no attempt is made to predict observations being
in that category individually for the purpose of redundancy detection.areg
, dataframeReduce
,
transcan
, varclus
,
subselect::genetic
set.seed(1)
n <- 100
x1 <- runif(n)
x2 <- runif(n)
x3 <- x1 + x2 + runif(n)/10
x4 <- x1 + x2 + x3 + runif(n)/10
x5 <- factor(sample(c('a','b','c'),n,replace=TRUE))
x6 <- 1*(x5=='a' | x5=='c')
redun(~x1+x2+x3+x4+x5+x6, r2=.8)
redun(~x1+x2+x3+x4+x5+x6, r2=.8, minfreq=40)
redun(~x1+x2+x3+x4+x5+x6, r2=.8, allcat=TRUE)
# x5 is no longer redundant but x6 is
Run the code above in your browser using DataLab