monomvn friendly
formatregress(X, y, method = c("lsr", "plsr", "pcr", "lasso", "lar",
"forward.stagewise", "stepwise", "ridge", "factor"), p = 0,
ncomp.max = Inf, validation = c("CV", "LOO", "Cp"),
verb = 0, quiet = TRUE)data.frame, matrix, or vector of inputs Xy of row-length equal to the
leading dimension (rows) of X, i.e., nrow(y) ==
nrow(X); if y is a vector, then nrow may be
interpreted as length"plsr"
(plsr, the default) for partial least s0 <= p="" <="1
is the proportion of the
number of columns to rows in the design matrix before an
alternative regression method (except "lsr")
is performed as if least-square=>method---only meaningful for the "plsr" or
"pcr" methods. Large settings can cause the execution to be
slow as they drastically increase the cros"CV" (randomized 10-fold cross-validation) is the faster method,
but does not yield a deterministic result and does notverb = 0) keeps quiet. This argument is provided for
monomvn and is not intended to be set by the user
via this interfacewarnings about regressions to be silenced
when TRUEregress returns a list containing
the components listed below.method input argumentmethod used: is NA when
method = "lsr"; is the number of principal
components for method = "pcr" and method = "plsr";
is the number of non-zero components in the coefficient vector
($b, not counting the intercept) for any of the
lars methods; and gives the chosen
$\lambda$ penalty parameter for method = "ridge"method is one of c("lasso",
"forward.stagewise", "ridge"), then this field records the
$\lambda$ penalty parameter usedncol(b) = ncol(y) and the intercept
in the first rowmethods (except "lsr") require a scheme for
estimating the amount of variability explained by increasing numbers
of non-zero coefficients (or principal components) in the model.
Towards this end, the regress function uses CV in all cases
except when nrow(X) <= 10<="" code="">, in which case CV fails and
LOO is used. Whenever nrow(X) <= 3<="" code=""> pcr
fails, so plsr is used instead.
If quiet = FALSE then a warning
is given whenever the first choice for a regression fails.
For pls methods, RMSEs
are calculated for a number of components in 1:ncomp.max where
a NULL value for ncomp.max it is replaced with ncomp.max <- min(ncomp.max, ncol(y), nrow(X)-1)
which is the max allowed by the pls package.
Simple heuristics are used to select a small number of components
(ncomp for pls ), or number of coefficients (for
lars ) which explains a large amount of the variability (RMSE).
The lars methods use a one-standard error rule outlined
in Section 7.10, page 216 of HTF below. The
pls package does not currently support the calculation of
standard errors for CV estimates of RMSE, so a simple linear penalty
for increasing ncomp is used instead. The ridge constant
(lambda) for lm.ridge is set using the optimize
function on the GCV output.
=>=> Bradley Efron, Trevor Hastie, Ian Johnstone and Robert Tibshirani
(2003).
Least Angle Regression (with discussion).
Annals of Statistics 32(2); see also
monomvn, blasso,
lars in the lm.ridge in the plsr and pcr in the
## following the lars diabetes example
data(diabetes)
attach(diabetes)
## Ordinary Least Squares regression
reg.ols <- regress(x, y)
## Lasso regression
reg.lasso <- regress(x, y, method="lasso")
## partial least squares regression
reg.plsr <- regress(x, y, method="plsr")
## ridge regression
reg.ridge <- regress(x, y, method="ridge")
## compare the coefs
data.frame(ols=reg.ols$b, lasso=reg.lasso$b,
plsr=reg.plsr$b, ridge=reg.ridge$b)
## summarize the posterior distribution of lambda2 and s2
detach(diabetes)Run the code above in your browser using DataLab