gbmt_fit: GBMT fit

Description

Fits a generalized boosting model. This is for "power" users who have a large number of variables who wish to avoid calling model.frame which can be slow in this instance.

Usage

gbmt_fit(
  x,
  y,
  distribution = gbm_dist("Gaussian"),
  weights = rep(1, nrow(x)),
  offset = rep(0, nrow(x)),
  train_params = training_params(num_trees = 100, interaction_depth = 3,
    min_num_obs_in_node = 10, shrinkage = 0.001, bag_fraction = 0.5, id =
    seq_len(nrow(x)), num_train = round(0.5 * nrow(x)), num_features = ncol(x)),
  response_name = "y",
  var_monotone = NULL,
  var_names = NULL,
  keep_gbm_data = FALSE,
  cv_folds = 1,
  cv_class_stratify = FALSE,
  fold_id = NULL,
  par_details = getOption("gbm.parallel"),
  is_verbose = FALSE
)

Value

a GBMFit object.

Arguments

x: a data frame or data matrix containing the predictor variables.
y: is a matrix of outcomes. Excluding CoxPH this matrix of outcomes collapses to a vector; in the case of CoxPH it is a survival object where the event times fill the first one (or two columns) and the status fills the final column. The length of the 1st dimension of y must match the number of rows in x.
distribution: a GBMDist object specifying the distribution and any additional parameters needed.
weights: optional vector of weights used in the fitting process. These weights must be positive but need not be normalized. By default they are set to 1 for each data row.
offset: optional vector specifying the model offset; must be positive. This defaults to a vector of 0's, the length of which is equal to the rows of x.
train_params: a GBMTrainParams object which specifies the parameters used in growing decision trees.
response_name: a string specifying the name of the response - defaults to "y".
var_monotone: optional vector, the same length as the number of predictors, indicating the relationship each variable has with the outcome. It have a monotone increasing (+1) or decreasing (-1) or an arbitrary relationship.
var_names: a vector of strings of containing the names of the predictor variables.
keep_gbm_data: a bool specifying whether or not the gbm_data object created in this method should be stored in the results.
cv_folds: a positive integer specifying the number of folds to be used in cross-validation of the gbm fit. If cv_folds > 1 then cross-validation is performed; the default of cv_folds is 1.
cv_class_stratify: a bool specifying whether or not to stratify via response outcome. Currently only applies to "Bernoulli" distribution and defaults to false.
fold_id: An optional vector of values identifying what fold each observation is in. If supplied, cv_folds can be missing. Note: Multiple rows of the same observation must have the same fold_id.
par_details: Details of the parallelization to use in the core algorithm.
is_verbose: if TRUE, gbmt will print out progress and performance of the fit.