crch.stabsel: Auxiliary functions to perform stability selection using boosting.

Description

Auxilirary function which allows to do stability selection on heteroscedastic crch models based on crch.boost.

Usage

crch.stabsel(formula, data, ..., nu = 0.1, q, B = 100, thr = 0.9, 
  maxit = 2000, data_percentage = 0.5)

Value

Returns an object of class "stabsel.crch" containing the stability selection summary and the new formula based on the stability selection.

table: A table object containing the parameters which have been selected and the corresponding frequency of selection.
formula.org: Original formula used to perform the stability selection.
formula.new: New formula based including the coefficients selected during stability selection.
family: A list object which contains the distribution-specification from the crch.stabsel call including: dist, cens, and truncated.
parameter: List with the parameters used to perform the stability selection including q, B, thr, p, and PFER (per-family error rate).

Arguments

formula: a formula expression of the form y ~ x | z where y is the response and x and z are regressor variables for the location and the scale of the fitted distribution respectively.
data: an optional data frame containing the variables occurring in the formulas.
...: Additional attributes to control the crch model. Note that control is not allowed; crch.stabsel uses crch.boost by default.
nu: Boosting step size (see crch.boost) default is 0.1 as for crch.boost while lower values might yield better results frequently and should be considered.
q: Positive numeric. Maximum number of parameters to be selected during each iteration (not including intercepts).
B: numeric, total number of iterations.
thr: numeric threshold ((0.5-1.0)). Used to generate the new formula and the computation of the per-family error rate.
maxit: Positive numeric value. Maximum number for the boosting algorithm. If q is not reached before maxit the algorithm will stop.
data_percentage: Percentage of data which should be sampled in each of the iterations. Default (and suggested) is 0.5.

Details

crch.boost allows to perform gradient boosting on heteroscedastic additive models. crch.stabsel is a wrapper around the core crch.boost algorithm to perform stability selection (see references).

Half of the data set (data) is sampled B times to perform boosting (based on crch.boost). Rather than perform the boosting iterations until a certain stopping criterion is reached (e.g., maximum number of iterations maxit) the algorithm stops as soon as q parameters have been selected. The number of parameters is computed across both parameters location and scale. Intercepts are not counted.

References

Meinhausen N, Buehlmann P (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417--473. tools:::Rd_expr_doi("10.1111/j.1467-9868.2010.00740.x").

Examples

Run this code

# generate data
suppressWarnings(RNGversion("3.5.0"))
set.seed(5)
x <- matrix(rnorm(1000*20),1000,20)
y <- rnorm(1000, 1 + x[,1] - 1.5 * x[,2], exp(-1 + 0.3*x[,3]))
y <- pmax(0, y)
data <- data.frame(cbind(y, x))

# fit model with maximum likelihood
CRCH1 <- crch(y ~ .|., data = data, dist = "gaussian", left = 0)

# Perform stability selection
stabsel <- crch.stabsel(y ~ .|.,  data = data, dist = "gaussian", left = 0,
           q = 8, B = 5)

# Show stability selection summary
print(stabsel); plot(stabsel)

CRCH2 <- crch(stabsel$formula.new, data = data, dist = "gaussian", left = 0 )
BOOST <- crch(stabsel$formula.new, data = data, dist = "gaussian", left = 0,
              control = crch.boost() )

### AIC comparison
sapply( list(CRCH1,CRCH2,BOOST), logLik )

Run the code above in your browser using DataLab