crch (version 1.0-4)

crch.stabsel: Auxiliary functions to perform stability selection using boosting.

Description

Auxilirary function which allows to do stability selection on heteroscedastic crch models based on crch.boost.

Usage

crch.stabsel(formula, data, ..., nu = 0.1, q, B = 100, thr = 0.9, 
  maxit = 2000, data_percentage = 0.5)

Value

Returns an object of class "stabsel.crch" containing the stability selection summary and the new formula based on the stability selection.

table

A table object containing the parameters which have been selected and the corresponding frequency of selection.

formula.org

Original formula used to perform the stability selection.

formula.new

New formula based including the coefficients selected during stability selection.

family

A list object which contains the distribution-specification from the crch.stabsel call including: dist, cens, and truncated.

parameter

List with the parameters used to perform the stability selection including q, B, thr, p, and PFER (per-family error rate).

Arguments

formula

a formula expression of the form y ~ x | z where y is the response and x and z are regressor variables for the location and the scale of the fitted distribution respectively.

data

an optional data frame containing the variables occurring in the formulas.

...

Additional attributes to control the crch model. Note that control is not allowed; crch.stabsel uses crch.boost by default.

nu

Boosting step size (see crch.boost) default is 0.1 as for crch.boost while lower values might yield better results frequently and should be considered.

q

Positive numeric. Maximum number of parameters to be selected during each iteration (not including intercepts).

B

numeric, total number of iterations.

thr

numeric threshold ((0.5-1.0)). Used to generate the new formula and the computation of the per-family error rate.

maxit

Positive numeric value. Maximum number for the boosting algorithm. If q is not reached before maxit the algorithm will stop.

data_percentage

Percentage of data which should be sampled in each of the iterations. Default (and suggested) is 0.5.

Details

crch.boost allows to perform gradient boosting on heteroscedastic additive models. crch.stabsel is a wrapper around the core crch.boost algorithm to perform stability selection (see references).

Half of the data set (data) is sampled B times to perform boosting (based on crch.boost). Rather than perform the boosting iterations until a certain stopping criterion is reached (e.g., maximum number of iterations maxit) the algorithm stops as soon as q parameters have been selected. The number of parameters is computed across both parameters location and scale. Intercepts are not counted.

References

Meinhausen N, Buehlmann P (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417--473. tools:::Rd_expr_doi("10.1111/j.1467-9868.2010.00740.x").

See Also

crch, crch.boost

Examples

Run this code
# generate data
suppressWarnings(RNGversion("3.5.0"))
set.seed(5)
x <- matrix(rnorm(1000*20),1000,20)
y <- rnorm(1000, 1 + x[,1] - 1.5 * x[,2], exp(-1 + 0.3*x[,3]))
y <- pmax(0, y)
data <- data.frame(cbind(y, x))

# fit model with maximum likelihood
CRCH1 <- crch(y ~ .|., data = data, dist = "gaussian", left = 0)

# Perform stability selection
stabsel <- crch.stabsel(y ~ .|.,  data = data, dist = "gaussian", left = 0,
           q = 8, B = 5)

# Show stability selection summary
print(stabsel); plot(stabsel)

CRCH2 <- crch(stabsel$formula.new, data = data, dist = "gaussian", left = 0 )
BOOST <- crch(stabsel$formula.new, data = data, dist = "gaussian", left = 0,
              control = crch.boost() )

### AIC comparison
sapply( list(CRCH1,CRCH2,BOOST), logLik )

Run the code above in your browser using DataCamp Workspace