Learn R Programming

spikeslab (version 1.1.4)

cv.spikeslab: K-fold Cross-Validation for Spike and Slab Regression

Description

Computes the K-fold cross-validated mean squared prediction error for the generalized elastic net from spike and slab regression. Returns a stability index for each variable.

Usage

cv.spikeslab(x = NULL, y = NULL, K = 10, parallel = FALSE, 
    plot.it = TRUE, n.iter1 = 500, n.iter2 = 500, mse = TRUE,
    bigp.smalln = FALSE, bigp.smalln.factor = 1, screen = (bigp.smalln),
    r.effects = NULL, max.var = 500, center = TRUE, intercept = TRUE,
    fast = TRUE, beta.blocks = 5, verbose = TRUE, save.all = TRUE,
    ntree = 300, seed = NULL, ...)

Arguments

x
x-predictor matrix.
y
y-response values.
K
Number of folds.
parallel
Indicates whether computations should be performed using parallel processing. Parallel processing is implemented via the package snow. This package should be loaded prior to calling cv.spikeslab. When not set
plot.it
If TRUE, plots the mean prediction error and its standard error.
n.iter1
Number of burn-in Gibbs sampled values (i.e., discarded values).
n.iter2
Number of Gibbs sampled values, following burn-in.
mse
If TRUE, an external estimate for the overall variance is calculated.
bigp.smalln
Use if p >> n.
bigp.smalln.factor
Top n times this value of variables to be kept in the filtering step (used when p >> n).
screen
If TRUE, variables are first pre-filtered.
r.effects
List used for grouping variables (see details below).
max.var
Maximum number of variables allowed in the final model.
center
If TRUE, variables are centered by their means. Default is TRUE and should only be adjusted in extreme examples.
intercept
If TRUE, an intercept is included in the model, otherwise no intercept is included. Default is TRUE.
fast
If TRUE, use blocked Gibbs sampling to accelerate the algorithm.
beta.blocks
Update beta using this number of blocks (fast must be TRUE).
verbose
If TRUE, verbose output is sent to the terminal.
save.all
If TRUE, spikeslab object for each fold is saved and returned.
ntree
Number of trees used by random forests (applies only when mse is TRUE).
seed
Seed for random number generator. Must be a negative integer.
...
Further arguments passed to or from other methods.

Value

  • Invisibly returns a list with components:
  • spikeslab.objSpike and slab object from the full data.
  • cv.spikeslab.objList containing spike and slab objects from each fold. Can be NULL.
  • cv.foldList containing the cv splits.
  • cvMean-squared error for each fold for the gnet.
  • cv.pathA matrix of mean-squared errors for the gnet solution path. Rows correspond to model sizes, columns are the folds.
  • stabilityMatrix containing stability for each variable defined as the percentage of times a variable is identified over the K-folds. Also includes bma and gnet coefficient values and their cv-fold-averaged values.
  • bmabma coefficients from the full data in terms of the standardized x.
  • bma.scalebma coefficients from the full data, scaled in terms of the original x.
  • gnetcv-optimized gnet in terms of the standardized x.
  • gnet.scalecv-optimized gnet in terms of the original x.
  • gnet.modelList of models selected by gnet over the K-folds.
  • gnet.pathgnet path from the full data, scaled in terms of the original x.
  • gnet.objgnet object from fitting the full data (a lars-type object).
  • gnet.obj.varsVariables (in order) used to calculate the gnet object.
  • verboseVerbose details (used for printing).

References

Ishwaran H. and Rao J.S. (2005a). Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Statist., 33:730-773.

Ishwaran H. and Rao J.S. (2010). Generalized ridge regression: geometry and computational solutions when p is larger than n.

Ishwaran H. and Rao J.S. (2011). Mixing generalized ridge regressions.

See Also

sparsePC.spikeslab, plot.spikeslab, predict.spikeslab, print.spikeslab.

Examples

Run this code
#------------------------------------------------------------
# Example 1: 10-fold validation using parallel processing
#------------------------------------------------------------

data(ozoneI, package = "spikeslab")
y <- ozoneI[,  1]
x <- ozoneI[, -1]
cv.obj <- cv.spikeslab(x = x, y = y, parallel = 4)
plot(cv.obj, plot.type = "cv")
plot(cv.obj, plot.type = "path")

#------------------------------------------------------------
# Example 2: 10-fold validation using parallel processing
# (high dimensional diabetes data)
#------------------------------------------------------------

# add 2000 noise variables
data(diabetesI, package = "spikeslab")
diabetes.noise <- cbind(diabetesI,
      noise = matrix(rnorm(nrow(diabetesI) * 2000), nrow(diabetesI)))
x <- diabetes.noise[, -1]
y <- diabetes.noise[, 1]

cv.obj <- cv.spikeslab(x = x, y = y, bigp.smalln=TRUE, parallel = 4)
plot(cv.obj)

Run the code above in your browser using DataLab