stabsel: Stability Selection

Description

Selection of influential variables or model components with error control.

Usage

## a method to compute stability selection paths for fitted mboost models
## S3 method for class 'mboost':
stabsel(x, cutoff, q, PFER,
        folds = subsample(model.weights(x), B = B),
        B = ifelse(sampling.type == "MB", 100, 50),
        assumption = c("unimodal", "r-concave", "none"),
        sampling.type = c("SS", "MB"),
        papply = mclapply, verbose = TRUE, FWER, eval = TRUE, ...)
## just a wrapper to stabsel(p, ..., eval = FALSE)
## S3 method for class 'mboost':
stabsel_parameters(p, ...)

Arguments

x, p

an fitted model of class "mboost".

cutoff

cutoff between 0.5 and 1. Preferably a value between 0.6 and 0.9 should be used.

number of (unique) selected variables (or groups of variables depending on the model) that are selected on each subsample.

PFER

upper bound for the per-family error rate. This specifies the amount of falsely selected base-learners, which is tolerated. See details.

folds

a weight matrix with number of rows equal to the number of observations, see cvrisk and subsample. Usually one should not change the defaul

assumption

Defines the type of assumptions on the distributions of the selection probabilities and simultaneous selection probabilities. Only applicable for sampling.type = "SS". For sampling.type = "MB" we always use code{"

sampling.type

use sampling scheme of of Shah & Samworth (2013), i.e., with complementarty pairs (sampling.type = "SS"), or the original sampling scheme of Meinshausen & Buehlmann (2010).

number of subsampling replicates. Per default, we use 50 complementary pairs for the error bounds of Shah & Samworth (2013) and 100 for the error bound derived in Meinshausen & Buehlmann (2010). As we use $B$ complementray pairs in the former

papply

(parallel) apply function, defaults to mclapply. Alternatively, parLapply can be used. In the latter case, usually more setup is needed (see example of

verbose

logical (default: TRUE) that determines wether warnings should be issued.

FWER

deprecated. Only for compatibility with older versions, use PFER instead.

eval

logical. Determines whether stability selection is evaluated (eval = TRUE; default) or if only the parameter combination is returned.

...

additional arguments to parallel apply methods such as mclapply and to cvrisk.

Value

An object of class stabsel with a special print method. The object has the following elements:
phatselection probabilities.
selectedelements with maximal selection probability greater cutoff.
maxmaximum of selection probabilities.
cutoffcutoff used.
qaverage number of selected variables used.
PFERper-family error rate.
sampling.typethe sampling type used for stability selection.
assumptionthe assumptions made on the selection probabilities.
callthe call.

Details

For details see stabsel in package stabs and Hofner et al. (2014).

References

B. Hofner, L. Boccuto and M. Goeker (2014), Controlling false discoveries in high-dimensional situations: Boosting with stability selection. Technical Report, arXiv:1411.1285. http://arxiv.org/abs/1411.1285.

N. Meinshausen and P. Buehlmann (2010), Stability selection. Journal of the Royal Statistical Society, Series B, 72, 417--473.

R.D. Shah and R.J. Samworth (2013), Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society, Series B, 75, 55--80.

Examples

Run this code

## make data set available
  data("bodyfat", package = "TH.data")
  ## set seed
  set.seed(1234)

  ### low-dimensional example
  mod <- glmboost(DEXfat ~ ., data = bodyfat)

  ## compute cutoff ahead of running stabsel to see if it is a sensible
  ## parameter choice.
  ##   p = ncol(bodyfat) - 1 (= Outcome) + 1 ( = Intercept)
  stabsel_parameters(q = 3, PFER = 1, p = ncol(bodyfat) - 1 + 1,
                     sampling.type = "MB")
  ## the same:
  stabsel(mod, q = 3, PFER = 1, sampling.type = "MB", eval = FALSE)

  ## now run stability selection
  (sbody <- stabsel(mod, q = 3, PFER = 1, sampling.type = "MB"))
  opar <- par(mai = par("mai") * c(1, 1, 1, 2.7))
  plot(sbody)
  par(opar)

  plot(sbody, type = "maxsel", ymargin = 6)

Run the code above in your browser using DataLab