monomvn
friendly
formatregress(X, y, method = c("lsr", "plsr", "pcr", "lasso", "lar",
"forward.stagewise", "stepwise", "ridge", "factor"), p = 0,
ncomp.max = Inf, validation = c("CV", "LOO", "Cp"),
verb = 0, quiet = TRUE)
data.frame
, matrix
, or vector of inputs X
y
of row-length equal to the
leading dimension (rows) of X
, i.e., nrow(y) ==
nrow(X)
; if y
is a vector, then nrow
may be
interpreted as length
"plsr"
(plsr, the default) for partial least s0 <= p="" <="1
is the proportion of the
number of columns to rows in the design matrix before an
alternative regression method
(except "lsr"
)
is performed as if least-square=>method
---only meaningful for the "plsr"
or
"pcr"
methods. Large settings can cause the execution to be
slow as they drastically increase the cros"CV"
(randomized 10-fold cross-validation) is the faster method,
but does not yield a deterministic result and does notverb = 0
) keeps quiet. This argument is provided for
monomvn
and is not intended to be set by the user
via this interfacewarning
s about regressions to be silenced
when TRUE
regress
returns a list
containing
the components listed below.method
input argumentmethod
used: is NA
when
method = "lsr"
; is the number of principal
components for method = "pcr"
and method = "plsr"
;
is the number of non-zero components in the coefficient vector
($b
, not counting the intercept) for any of the
lars
methods; and gives the chosen
$\lambda$ penalty parameter for method = "ridge"
method
is one of c("lasso",
"forward.stagewise", "ridge")
, then this field records the
$\lambda$ penalty parameter usedncol(b) = ncol(y)
and the intercept
in the first rowmethod
s (except "lsr"
) require a scheme for
estimating the amount of variability explained by increasing numbers
of non-zero coefficients (or principal components) in the model.
Towards this end, the regress
function uses CV in all cases
except when nrow(X) <= 10<="" code="">, in which case CV fails and
LOO is used. Whenever nrow(X) <= 3<="" code=""> pcr
fails, so plsr
is used instead.
If quiet = FALSE
then a warning
is given whenever the first choice for a regression fails.
For pls methods, RMSEs
are calculated for a number of components in 1:ncomp.max
where
a NULL
value for ncomp.max
it is replaced with ncomp.max <- min(ncomp.max, ncol(y), nrow(X)-1)
which is the max allowed by the pls package.
Simple heuristics are used to select a small number of components
(ncomp
for pls ), or number of coefficients (for
lars ) which explains a large amount of the variability (RMSE).
The lars methods use a one-standard error rule outlined
in Section 7.10, page 216 of HTF below. The
pls package does not currently support the calculation of
standard errors for CV estimates of RMSE, so a simple linear penalty
for increasing ncomp
is used instead. The ridge constant
(lambda) for lm.ridge
is set using the optimize
function on the GCV
output.
=>
=>
Bradley Efron, Trevor Hastie, Ian Johnstone and Robert Tibshirani
(2003).
Least Angle Regression (with discussion).
Annals of Statistics 32(2); see also
monomvn
, blasso
,
lars
in the lm.ridge
in the plsr
and pcr
in the
## following the lars diabetes example
data(diabetes)
attach(diabetes)
## Ordinary Least Squares regression
reg.ols <- regress(x, y)
## Lasso regression
reg.lasso <- regress(x, y, method="lasso")
## partial least squares regression
reg.plsr <- regress(x, y, method="plsr")
## ridge regression
reg.ridge <- regress(x, y, method="ridge")
## compare the coefs
data.frame(ols=reg.ols$b, lasso=reg.lasso$b,
plsr=reg.plsr$b, ridge=reg.ridge$b)
## summarize the posterior distribution of lambda2 and s2
detach(diabetes)
Run the code above in your browser using DataLab