Learn R Programming

spikeslab (version 1.0.2)

cv.spikeslab: K-fold Cross-Validation for Spike and Slab Regression

Description

Computes the K-fold cross-validated mean squared prediction error for the generalized elastic net from spike and slab regression. Returns a stability index for each variable.

Usage

cv.spikeslab(x = NULL, y = NULL, K = 10, plot.it = TRUE,
    n.iter1 = 500, n.iter2 = 500, mse = TRUE,
    bigp.smalln = FALSE, bigp.smalln.factor = 1, screen = (bigp.smalln),
    r.effects = NULL, max.var = 500, center = TRUE, intercept = TRUE,
    fast = TRUE, beta.blocks = 5, verbose = TRUE, ntree = 300,
    seed = NULL, ...)

Arguments

x
x-predictor matrix.
y
y-response values.
K
Number of folds.
plot.it
If TRUE, plots the mean prediction error and its standard error.
n.iter1
Number of burn-in Gibbs sampled values (i.e., discarded values).
n.iter2
Number of Gibbs sampled values, following burn-in.
mse
If TRUE, an external estimate for the overall variance is calculated.
bigp.smalln
Use if p >> n.
bigp.smalln.factor
Top n times this value of variables to be kept in the filtering step (used when p >> n).
screen
If TRUE, variables are first pre-filtered.
r.effects
List used for grouping variables (see details below).
max.var
Maximum number of variables allowed in the final model.
center
If TRUE, variables are centered by their means. Default is TRUE and should only be adjusted in extreme examples.
intercept
If TRUE, an intercept is included in the model, otherwise no intercept is included. Default is TRUE.
fast
If TRUE, use blocked Gibbs sampling to accelerate the algorithm.
beta.blocks
Update beta using this number of blocks (fast must be TRUE).
verbose
If TRUE, verbose output is sent to the terminal.
ntree
Number of trees used by random forests (applies only when mse is TRUE).
seed
Seed for random number generator. Must be a negative integer.
...
Further arguments passed to or from other methods.

Value

  • Invisibly returns a list with components:
  • cvK-dimensional vector of mean-squared errors for the gnet.
  • cv.pathA matrix of mean-squared errors for the gnet solution path. Rows correspond to model sizes, columns are the folds.
  • stabilityStability for each variable defined as the percentage of times a variable is identified over the K-folds.
  • gnet.pathgnet path from the full data, scaled in terms of the original x.
  • gnet.objgnet object from fitting the full data (a lars-type object).

References

Ishwaran H. and Rao J.S. (2005a). Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Statist., 33:730-773.

Ishwaran H. and Rao J.S. (2009). Generalized ridge regression: geometry and computational solutions when p is larger than n.

See Also

sparsePC.spikeslab, predict.spikeslab, print.spikeslab.

Examples

Run this code
#------------------------------------------------------------
# Example:  10-fold validation
#------------------------------------------------------------

data(diabetesI, package = "spikeslab")
y <- diabetesI[,  1]
x <- diabetesI[, -1]
cv.obj <- cv.spikeslab(x = x, y = y)
print(head(cv.obj$stability, 25))

Run the code above in your browser using DataLab