regress: Switch function for least squares and parsimonious monomvn regressions

Description

This function fits the specified ordinary least squares or parsimonious regression (plsr, pcr, ridge, and lars methods) depending on the arguments provided, and returns estimates of coefficients and (co-)variances in a monomvn friendly format

Usage

regress(X, y, method = c("lsr", "plsr", "pcr", "lasso", "lar",
     "forward.stagewise", "stepwise", "ridge", "factor"), p = 0,
     ncomp.max = Inf, validation = c("CV", "LOO", "Cp"),
     verb = 0, quiet = TRUE)

Arguments

data.frame, matrix, or vector of inputs X

matrix of responses y of row-length equal to the leading dimension (rows) of X, i.e.,

nrow(y) ==
      nrow(X)

; if y is a vector, then nrow may be interpreted as length

method

describes the type of parsimonious (or shrinkage) regression, or ordinary least squares. From the pls package we have "plsr" (plsr, the default) for partial least s

when performing regressions,

0 <= p="" <="1
    is the proportion of the
    number of columns to rows in the design matrix before an
    alternative regression method (except "lsr")
    is performed as if  least-square

ncomp.max

maximal number of (principal) components to consider in a method---only meaningful for the "plsr" or "pcr" methods. Large settings can cause the execution to be slow as they drastically increase the cros

validation

method for cross validation when applying a parsimonious regression method. The default setting of "CV" (randomized 10-fold cross-validation) is the faster method, but does not yield a deterministic result and does not

verb

whether or not to print progress indicators. The default (verb = 0) keeps quiet. This argument is provided for monomvn and is not intended to be set by the user via this interface

quiet

causes warnings about regressions to be silenced when TRUE

Value

regress returns a list containing the components listed below.
calla copy of the function call as used
methoda copy of the method input argument
ncompdepends on the method used: is NA when method = "lsr"; is the number of principal components for method = "pcr" and method = "plsr"; is the number of non-zero components in the coefficient vector ($b, not counting the intercept) for any of the lars methods; and gives the chosen $\lambda$ penalty parameter for method = "ridge"
lambdaif method is one of c("lasso", "forward.stagewise", "ridge"), then this field records the $\lambda$ penalty parameter used
bmatrix containing the estimated regression coefficients, with ncol(b) = ncol(y) and the intercept in the first row
S(biased corrected) maximum likelihood estimate of residual covariance matrix

Details

All methods (except "lsr") require a scheme for estimating the amount of variability explained by increasing numbers of non-zero coefficients (or principal components) in the model. Towards this end, the pls and lars packages support 10-fold cross validation (CV) or leave-one-out (LOO) CV estimates of root mean squared error. See pls and lars for more details. The regress function uses CV in all cases except when

nrow(X) <= 10<="" code="">, in which case CV fails and
  LOO is used.  Whenever nrow(X) <= 3<="" code=""> pcr
  fails,  so plsr is used instead.
  If quiet = FALSE then a warning
  is given whenever the first choice for a regression fails.
  
  For pls methods, RMSEs
  are calculated for a number of components in 1:ncomp.max where
  a NULL value for ncomp.max it is replaced with
  ncomp.max <- min(ncomp.max, ncol(y), nrow(X)-1)
  which is the max allowed by the pls package.
  
  Simple heuristics are used to select a small number of components
  (ncomp for pls), or number of coefficients (for
  lars) which explains a large amount of the variability (RMSE).
  The lars methods use a one-standard error rule outlined
  in Section 7.10, page 216 of HTF below.  The
  pls package does not currently support the calculation of
  standard errors for CV estimates of RMSE, so a simple linear penalty
  for increasing ncomp is used instead.  The ridge constant
  (lambda) for lm.ridge is set using the optimize
  function on the GCV output.

References

Bjorn-Helge Mevik and Ron Wehrens (2007). The pls Package: Principal Component and Partial Least Squares Regression in R. Journal of Statistical Software 18(2)

Bradley Efron, Trevor Hastie, Ian Johnstone and Robert Tibshirani (2003). Least Angle Regression (with discussion). Annals of Statistics 32(2); see also http://www-stat.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf http://faculty.chicagobooth.edu/robert.gramacy/monomvn.html

Examples

Run this code

## following the lars diabetes example
data(diabetes)
attach(diabetes)

## Ordinary Least Squares regression
reg.ols <- regress(x, y)

## Lasso regression
reg.lasso <- regress(x, y, method="lasso")

## partial least squares regression
reg.plsr <- regress(x, y, method="plsr")

## ridge regression
reg.ridge <- regress(x, y, method="ridge")

## compare the coefs
data.frame(ols=reg.ols$b, lasso=reg.lasso$b,
           plsr=reg.plsr$b, ridge=reg.ridge$b)

## summarize the posterior distribution of lambda2 and s2
detach(diabetes)

Run the code above in your browser using DataLab