escv.glmnet: escv glmnet

Description

Does k-fold estimation stability with cross-validation (escv) for glmnet and returns optimal values for lambda.

Usage

escv.glmnet(x, y, lambda = NULL, nfolds = 10, foldid, cv.OLS = FALSE, tau = 0, parallel 
            = FALSE, standardize = TRUE, intercept = TRUE, ...)

Arguments

Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector. Can be in sparse matrix format (inherit from class "sparseMatrix" as in package Matrix).

Response variable.

lambda

Optional user-supplied lambda sequence for the Lasso; default is NULL, and glmnet chooses its own sequence.

nfolds

Number of folds - default is 10.

foldid

An optional vector of values between 1 and nfold identifying what fold each observation is in. If supplied, nfolds can be missing.

cv.OLS

If TRUE, uses two-stage estimator Lasso+OLS in the fits (using Lasso to select variables/predictors and then using OLS to refit the coefficients for the selected variables/predictors. The default value is FALSE.

tau

Tuning parameter in modified Least Squares (mls). Default value is 0, which corresponds to OLS.

parallel

If TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doParallel or others. See the example below.

standardize

Logical flag for x variable standardization, prior to fitting the model sequence. Default is standardize=TRUE.

intercept

Should intercept be fitted (default is TRUE) or set to zero (FALSE).

...

Other arguments that can be passed to glmnet.

Value

A list consisting of the following elements is returned.

lambda

The values of lambda used in the fits.

glmnet.fit

A fitted glmnet object for the full data.

The mean cross-validated error - a vector of length length(lambda).

cv.error

Estimate of standard error of cv.

The mean estimation stability (es) value - a vector of length length(lambda).

es.error

Estimate of standard error of es.

lambda.cv

Value of lambda that gives minimum cv.

lambda.cv1se

Largest value of lambda such that cross-validated error is within 1 standard error of the minimum.

lambda.escv

Value of lambda selected by escv -- giving the minimum es within the range of lambdas which are no less than lambda.cv.

Details

The function is similar to cv.glmnet, and returns the values of lambda selected by cross-validation (cv), by cross-validation within 1 standard error (cv1se) and by estimation stability with cross-validation (escv). The function runs glmnet nfolds+1 times; the first to get the lambda sequence, and then the remainder to compute the first stage fit (i.e., Lasso) with each of the folds omitted. The error (cv and also es) is accumulated, and the average error and standard deviation over the folds is computed. Note that, similar to cv.glmnet, the results of escv.glmnet are random, since the folds are selected at random. Users can reduce this randomness by running escv.glmnet many times, and averaging the error curves.

Examples

Run this code

library("glmnet")
library("mvtnorm") 

## generate the data
set.seed(2015)
n <- 200      # number of obs
p <- 500
s <- 10
beta <- rep(0, p)
beta[1:s] <- runif(s, 1/3, 1)
x <- rmvnorm(n = n, mean = rep(0, p), method = "svd")
signal <- sqrt(mean((x %*% beta)^2))
sigma <- as.numeric(signal / sqrt(10))  # SNR=10
y <- x %*% beta + rnorm(n)

## escv without parallel
# using Lasso+OLS in the cv fit.
set.seed(0)
obj <- escv.glmnet(x, y, cv.OLS = TRUE) 

# using Lasso in the cv fit.
set.seed(0)
obj <- escv.glmnet(x, y)

## escv with parallel
#library("doParallel")
#library("doRNG")
#registerDoParallel(2)

# using Lasso+OLS in the cv fit.
#registerDoRNG(seed = 0)
#obj <- escv.glmnet(x, y, cv.OLS = TRUE, nfolds = 4, parallel = TRUE)

# using Lasso in the cv fit.
#registerDoRNG(seed = 0) 
#obj <- escv.glmnet(x, y, parallel = TRUE)

Run the code above in your browser using DataLab