gpe: Derive a General Prediction Ensemble (gpe)

Description

Provides an interface for deriving sparse prediction ensembles where basis functions are selected through L1 penalization.

Usage

gpe(formula, data, base_learners = list(gpe_trees(), gpe_linear()),
  weights = rep(1, times = nrow(data)), sample_func = gpe_sample(),
  verbose = FALSE, penalized_trainer = gpe_cv.glmnet(), model = TRUE)

Arguments

formula

Symbolic description of the model to be fit of the form y ~ x1 + x2 + ...+ xn. If the output variable (left-hand side of the formula) is a factor, an ensemble for binary classification is created. Otherwise, an ensemble for prediction of a continuous variable is created.

data

data.frame containing the variables in the model.

base_learners

List of functions which has formal arguments formula, data, weights, sample_func, verbose and family and returns a vector of characters with terms for the final formula passed to cv.glmnet. See gpe_linear, gpe_trees, and gpe_earth.

weights

Case weights with length equal to number of rows in data.

sample_func

Function used to sample when learning with base learners. The function should have formal argument n and weights and return a vector of indices. See gpe_sample.

verbose

TRUE if comments should be posted throughout the computations.

penalized_trainer

Function with formal arguments x, y, weights, family which returns a fit object. This can be changed to test other "penalized trainers" (like other function that perform an L1 penalty or L2 penalty and elastic net penalty). Not using cv.glmnet may cause other function for gpe objects to fail. See gpe_cv.glmnet.

model

TRUE if the data should added to the returned object.

Value

An object of class gpe.

Details

Provides a more general framework for making a sparse prediction ensemble than pre. A similar fit to pre can be estimated with the following call:

gpe(formula = y ~ x1 + x2 + x3, data = data, base_learners = list(gpe_linear(), gpe_trees()))

Products of hinge functions using MARS can be added to the ensemble above with the following call:

gpe(formula = y ~ x1 + x2 + x3, data = data, base_learners = list(gpe_linear(), gpe_trees(), gpe_earth))

Other customs base learners can be implemented. See gpe_trees, gpe_linear or gpe_earth for details of the setup. The sampling function given by sample_func can also be replaced by a custom sampling function. See gpe_sample for details of the setup.

References

Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics The Annals of Applied Statistics, 2(3), 916-954.