pcLasso (version 1.1)

cv.pcLasso: Cross-validation for pcLasso

Description

Does k-fold cross-validation for pcLasso.

Usage

cv.pcLasso(x, y, w = rep(1, length(y)), ratio = NULL, theta = NULL,
  groups = vector("list", 1), family = "gaussian", nfolds = 10,
  foldid = NULL, keep = FALSE, verbose = FALSE, ...)

Arguments

x

x matrix as in pcLasso.

y

y matrix as in pcLasso.

w

Observation weights. Default is 1 for each observation.

ratio

Ratio of shrinkage between the second and first principal components in the absence of the \(\ell_1\) penalty. More convenient way to specify the strength of the quadratic penalty. A value between 0 and 1 (only 1 included). ratio = 1 corresponds to the lasso, 0.5-0.9 are good values to use. Default is NULL. Exactly one of ratio or theta must be specified.

theta

Multiplier for the quadratic penalty: a non-negative real number. theta = 0 corresponds to the lasso, and larger theta gives strong shrinkage toward the top principal components. Default is NULL. Exactly one of ratio or theta must be specified.

groups

A list describing which features belong in each group. The length of the list should be equal to the number of groups, with groups[[k]] being a vector of feature/column numbers which belong to group k. Each feature must be assigned to at least one group. Features can belong to more than one group. By default, all the features belong to a single group.

family

Response type. Either "gaussian" (default) for linear regression or "binomial" for logistic regression.

nfolds

Number of folds for CV (default is 10). Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds = 3.

foldid

An optional vector of values between 1 and nfold identifying what fold each observation is in. If supplied, nfold can be missing.

keep

If keep = TRUE, a prevalidated array is returned containing fitted values for each observation at each value of lambda. This means these fits are computed with this observation and the rest of its fold omitted. Default is FALSE.

verbose

Print out progess along the way? Default is FALSE.

...

Other arguments that can be passed to pcLasso.

Value

An object of class "cv.pcLasso", which is a list with the ingredients of the cross-validation fit.

glmfit

A fitted pcLasso object for the full data.

theta

Value of theta used in model fitting.

lambda

The values of lambda used in the fits.

nzero

If the groups overlap, the number of non-zero coefficients in the model glmfit for the expanded feature space, at each value of lambda. Otherwise, the number of non-zero coefficients in the model glmfit for the original feature space.

orignzero

If the groups are overlapping, this is the number of non-zero coefficients in the model glmfit for the original feature space, at each lambda. If groups are not overlapping, it is NULL.

fit.preval

If keep=TRUE, this is the array of prevalidated fits.

cvm

The mean cross-validated error: a vector of length length(lambda).

cvse

Estimate of standard error of cvm.

cvlo

Lower curve = cvm - cvsd.

cvup

Upper curve = cvm + cvsd.

lambda.min

The value of lambda that gives minimum cvm.

lambda.1se

The largest value of lambda such that the CV error is within one standard error of the minimum.

foldid

If keep=TRUE, the fold assignments used.

name

Name of error measurement used for CV.

call

The call that produced this object.

Details

This function runs pcLasso nfolds+1 times: the first to get the lambda sequence, and the remaining nfolds times to compute the fit with each of the folds omitted. The error is accumulated, and the mean error and standard deviation over the folds is compued. Note that cv.pcLasso does NOT search for values of theta or ratio. A specific value of theta or ratio should be supplied.

See Also

pcLasso and plot.cv.pcLasso.

Examples

Run this code
# NOT RUN {
set.seed(1)
x <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
groups <- vector("list", 4)
for (k in 1:4) {
    groups[[k]] <- 5 * (k-1) + 1:5
}
cvfit1 <- cv.pcLasso(x, y, groups = groups, ratio = 0.8)

# change no. of CV folds
cvfit2 <- cv.pcLasso(x, y, groups = groups, ratio = 0.8, nfolds = 5)
# specify which observations are in each fold
foldid <- sample(rep(seq(5), length = length(y)))
cvfit3 <- cv.pcLasso(x, y, groups = groups, ratio = 0.8, foldid = foldid)

# keep=TRUE to have pre-validated fits and foldid returned
cvfit4 <- cv.pcLasso(x, y, groups = groups, ratio = 0.8, keep = TRUE)

# }

Run the code above in your browser using DataLab