monomvn
friendly
formatregress(X, y, method = c("lsr", "plsr", "pcr", "lasso", "lar",
"forward.stagewise", "stepwise", "ridge"), p = 0,
ncomp.max = Inf, validation = c("CV", "LOO", "Cp"),
verb = 0, quiet = TRUE)
data.frame
, matrix
, or vector of inputs X
y
of row-length equal to the
leading dimension (rows) of X
, i.e., nrow(Y) ==
nrow(X)
; if y
is a vector, then nrow
may be
interpreted as length<
"plsr"
(plsr, the default) for partial least sp
is the proportion of the
number of columns to rows in the design matrix before an
alternative regression method
(except "lsr"
)
is performed as if least-squares regression <method
---only meaningful for the "plsr"
or
"pcr"
methods. Large settings can cause the execution to be
slow as it drastically increases the cross-"CV"
(randomized 10-fold cross-validation) is the faster method,
but does not yield a deterministic result and does notverb = 0
) keeps quiet, This
argument is provided for monomvn
and is not intended
to be set by the user via this interfacewarning
s about regressions to be silenced
when TRUE
regress
returns a list containing
the components listed below.method
input argumentmethod
used: is NA
when
method = "lsr"
; is the number of principal
components for method = "pcr"
and method = "plsr"
;
is the number of non-zero components in the coefficient vector
($b
, not counting the intercept) for any of the
lars
methods; and gives the chosen
$\lambda$ penalty parameter for method = "ridge"
ncol(b) = ncol(y)
and the intercept
in the first rowmethod
s (except "lsr"
) require a scheme for
estimating the amount of variability explained by increasing numbers
of non-zero coefficients (or principal components) in the model.
Towards this end, the regress
uses CV in all cases except when
nrow(X) <= 10<="" code="">, in which case CV fails and
LOO is used. Whenever nrow(X) <= 3<="" code=""> pcr
fails, so plsr
is used instead.
If quiet = FALSE
then a warning
is given whenever the first choice for a regression fails.
For pls methods, RMSEs
are calculated for a number of components in 1:ncomp.max
where
is.null(ncomp.max)
it is replaced with ncomp.max <- min(ncomp.max, ncol(y), nrow(X)-1)
which is the max allowed by the pls package.
Simple heuristics are used to select a small number of components
(ncomp
for pls ), or number of coefficients (for
lars ) which explains a large amount of the variability (RMSE).
The lars methods use a one-standard error rule outlined
in Section 7.10, page 216 of HTF, in the reference below. The
pls package does not currently support the calculation of
standard errors for CV estimates of RMSE, so a simple linear penalty
for increasing ncomp
is used instead. The ridge constant
(lambda) for lm.ridge
is set using the optimize
function on the GCV
output.
=>
=>
Bradley Efron, Trevor Hastie, Ian Johnstone and Robert Tibshirani
(2003).
Least Angle Regression (with discussion).
Annals of Statistics 32(2); see also
monomvn
, blasso
,
lars
in the lm.ridge
in the plsr
and pcr
in the
## following the lars diabetes example
data(diabetes)
attach(diabetes)
## Ordinary Least Squares regression
reg.ols <- regress(x, y)
## Lasso regression
reg.lasso <- regress(x, y, method="lasso")
## partial least squares regression
reg.plsr <- regress(x, y, method="plsr")
## ridge regression
reg.ridge <- regress(x, y, method="ridge")
## compare the coefs
data.frame(ols=reg.ols$b, lasso=reg.lasso$b,
plsr=reg.plsr$b, ridge=reg.ridge$b)
## summarize the posterior distribution of lambda2 and s2
detach(diabetes)
Run the code above in your browser using DataLab