# stabsel

##### Stability Selection

Selection of influential variables or model components with error control.

- Keywords
- nonparametric

##### Usage

```
## a method to compute stability selection paths for fitted mboost models
# S3 method for mboost
stabsel(x, cutoff, q, PFER, grid = 0:mstop(x),
folds = subsample(model.weights(x), B = B),
B = ifelse(sampling.type == "MB", 100, 50),
assumption = c("unimodal", "r-concave", "none"),
sampling.type = c("SS", "MB"),
papply = mclapply, verbose = TRUE, FWER, eval = TRUE, ...)
```## just a wrapper to stabsel(p, ..., eval = FALSE)
# S3 method for mboost
stabsel_parameters(p, ...)

##### Arguments

- x, p
an fitted model of class

`"mboost"`

.- cutoff
cutoff between 0.5 and 1. Preferably a value between 0.6 and 0.9 should be used.

- q
number of (unique) selected variables (or groups of variables depending on the model) that are selected on each subsample.

- PFER
upper bound for the per-family error rate. This specifies the amount of falsely selected base-learners, which is tolerated. See details.

- grid
a numeric vector of the form

`0:m`

. See also`cvrisk`

.- folds
a weight matrix with number of rows equal to the number of observations, see

`cvrisk`

and`subsample`

. Usually one should not change the default here as subsampling with a fraction of \(1/2\) is needed for the error bounds to hold. One usage scenario where specifying the folds by hand might be the case when one has dependent data (e.g. clusters) and thus wants to draw clusters (i.e., multiple rows together) not individuals.- assumption
Defines the type of assumptions on the distributions of the selection probabilities and simultaneous selection probabilities. Only applicable for

`sampling.type = "SS"`

. For`sampling.type = "MB"`

we always use code"none".- sampling.type
use sampling scheme of of Shah & Samworth (2013), i.e., with complementarty pairs (

`sampling.type = "SS"`

), or the original sampling scheme of Meinshausen & Buehlmann (2010).- B
number of subsampling replicates. Per default, we use 50 complementary pairs for the error bounds of Shah & Samworth (2013) and 100 for the error bound derived in Meinshausen & Buehlmann (2010). As we use \(B\) complementray pairs in the former case this leads to \(2B\) subsamples.

- papply
(parallel) apply function, defaults to

`mclapply`

. Alternatively,`parLapply`

can be used. In the latter case, usually more setup is needed (see example of`cvrisk`

for some details).- verbose
logical (default:

`TRUE`

) that determines wether`warnings`

should be issued.- FWER
deprecated. Only for compatibility with older versions, use PFER instead.

- eval
logical. Determines whether stability selection is evaluated (

`eval = TRUE`

; default) or if only the parameter combination is returned.- …
additional arguments to parallel apply methods such as

`mclapply`

and to`cvrisk`

.

##### Details

For details see `stabsel`

in package stabs
and Hofner et al. (2015).

##### Value

An object of class `stabsel`

with a special `print`

method.
The object has the following elements:

selection probabilities.

elements with maximal selection probability greater
`cutoff`

.

maximum of selection probabilities.

cutoff used.

average number of selected variables used.

per-family error rate.

the sampling type used for stability selection.

the assumptions made on the selection probabilities.

the call.

##### References

B. Hofner, L. Boccuto and M. Goeker (2015),
Controlling false discoveries in high-dimensional situations: Boosting
with stability selection. *BMC Bioinformatics*, **16:144**.

N. Meinshausen and P. Buehlmann (2010), Stability selection.
*Journal of the Royal Statistical Society, Series B*,
**72**, 417--473.

R.D. Shah and R.J. Samworth (2013), Variable selection with error
control: another look at stability selection. *Journal of the Royal
Statistical Society, Series B*, **75**, 55--80.

##### See Also

##### Examples

```
# NOT RUN {
## make data set available
data("bodyfat", package = "TH.data")
## set seed
set.seed(1234)
### low-dimensional example
mod <- glmboost(DEXfat ~ ., data = bodyfat)
## compute cutoff ahead of running stabsel to see if it is a sensible
## parameter choice.
## p = ncol(bodyfat) - 1 (= Outcome) + 1 ( = Intercept)
stabsel_parameters(q = 3, PFER = 1, p = ncol(bodyfat) - 1 + 1,
sampling.type = "MB")
## the same:
stabsel(mod, q = 3, PFER = 1, sampling.type = "MB", eval = FALSE)
# }
# NOT RUN {
############################################################
## Do not run and check these examples automatically as
## they take some time (~ 10 seconds depending on the system)
## now run stability selection
(sbody <- stabsel(mod, q = 3, PFER = 1, sampling.type = "MB"))
opar <- par(mai = par("mai") * c(1, 1, 1, 2.7))
plot(sbody)
par(opar)
plot(sbody, type = "maxsel", ymargin = 6)
## End(Not run and test)
# }
```

*Documentation reproduced from package mboost, version 2.9-1, License: GPL-2*