gbm.holdout: gbm holdout

Description

Calculates a gradient boosting (gbm) object in which model complexity is determined using a training set with predictions made to a withheld set. An initial set of trees is fitted, and then trees are progressively added testing performance along the way, using gbm.perf until the optimal number of trees is identified.

As any structured ordering of the data should be avoided, a copy of the data set is BY DEFAULT randomly reordered each time the function is run.

Usage

gbm.holdout(data, gbm.x, gbm.y, learning.rate = 0.001, tree.complexity = 1, 
 family = "bernoulli", n.trees = 200, add.trees = n.trees, max.trees = 20000, 
 verbose = TRUE, train.fraction = 0.8, permute = TRUE, prev.stratify = TRUE,
 var.monotone = rep(0, length(gbm.x)), site.weights = rep(1, nrow(data)), 
 refit = TRUE, keep.data = TRUE)

Arguments

data

data.frame

gbm.x

indices of the predictors in the input dataframe

gbm.y

index of the response in the input dataframe

learning.rate

typically varied between 0.1 and 0.001

tree.complexity

sometimes called interaction depth

family

"bernoulli","poisson", etc. as for gbm

n.trees

initial number of trees

add.trees

number of trees to add at each increment

max.trees

maximum number of trees to fit

verbose

controls degree of screen reporting

train.fraction

proportion of data to use for training

permute

reorder data to start with

prev.stratify

stratify selection for presence/absence data

var.monotone

allows constraining of response to monotone

site.weights

set equal to 1 by default

refit

refit the model with the full data but id'd no of trees

keep.data

keep copy of the data

Value

A gbm object

References

Elith, J., J.R. Leathwick and T. Hastie, 2009. A working guide to boosted regression trees. Journal of Animal Ecology 77: 802-81