gpe_trees: Learner Functions Generators for gpe

Description

Functions to get "learner" functions for gpe.

Usage

gpe_trees(..., remove_duplicates_complements = TRUE, mtry = Inf,
  ntrees = 500, maxdepth = 3L, learnrate = 0.01, parallel = FALSE,
  use_grad = TRUE, tree.control = ctree_control(mtry = mtry, maxdepth =
  maxdepth))
gpe_linear(..., winsfrac = 0.025, normalize = TRUE)
gpe_earth(..., degree = 3, nk = 8, normalize = TRUE, ntrain = 100,
  learnrate = 0.1, cor_thresh = 0.99)

Arguments

...

Currently not used.

remove_duplicates_complements

TRUE. Should rules with complementary or duplicate support be removed?

mtry

Number of input variables randomly sampled as candidates at each node for random forest like algorithms. The argument is passed to the tree methods in the partykit package.

ntrees

Number of trees to fit. Will not have an effect if tree.control is used.

maxdepth

Maximum depth of trees. Will not have an effect if tree.control is used.

learnrate

Learning rate for methods. Corresponds to the \(\nu\) parameter in Friedman & Popescu (2008).

parallel

TRUE. Should basis functions be found in parallel?

use_grad

TRUE. Should binary outcomes use gradient boosting with regression trees when learnrate > 0? That is, use ctree instead of glmtree as in Friedman (2001) with a second order Taylor expansion instead of first order as in Chen and Guestrin (2016).

tree.control

ctree_control with options for the ctree function.

winsfrac

Quantile to winsorize linear terms. The value should be in \([0,0.5)\)

normalize

TRUE. Should value be scaled by .4 times the inverse standard deviation? If TRUE, gives linear terms the same influence as a typical rule.

degree

Maximum degree of interactions in earth model.

Maximum number of basis functions in earth model.

ntrain

Number of models to fit.

cor_thresh

A threshold on the pairwise correlation for removal of basis functions. This is similar to remove_duplicates_complements. One of the basis functions in pairs where the correlation exceeds the threshold is excluded. NULL implies no exclusion. Setting a value closer to zero will decrease the time needed to fit the final model.

Value

A function that has formal arguments formula, data, weights, sample_func, verbose, family, .... The function returns a vector with character where each element is a term for the final formula in the call to cv.glmnet

Details

gpe_trees provides learners for tree method. Either ctree or glmtree from the partykit package will be used.

gpe_linear provides linear terms for the gpe.

gpe_earth provides basis functions where each factor is a hinge function. The model is estimated with earth.

References

Hothorn, T., & Zeileis, A. (2015). partykit: A modular toolkit for recursive partytioning in R. Journal of Machine Learning Research, 16, 3905-3909.

Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals Statistics, 19(1), 1-67.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. The Annals of Applied Statistics, 29(5), 1189-1232.

Friedman, J. H. (1993). Fast MARS. Dept. of Statistics Technical Report No. 110, Stanford University.

Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.

Chen T., & Guestrin C. (2016). Xgboost: A scalable tree boosting system. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.