Fits a generalized boosting model. This is for "power" users who
have a large number of variables who wish to avoid calling
model.frame
which can be slow in this instance.
gbmt_fit(
x,
y,
distribution = gbm_dist("Gaussian"),
weights = rep(1, nrow(x)),
offset = rep(0, nrow(x)),
train_params = training_params(num_trees = 100, interaction_depth = 3,
min_num_obs_in_node = 10, shrinkage = 0.001, bag_fraction = 0.5, id =
seq_len(nrow(x)), num_train = round(0.5 * nrow(x)), num_features = ncol(x)),
response_name = "y",
var_monotone = NULL,
var_names = NULL,
keep_gbm_data = FALSE,
cv_folds = 1,
cv_class_stratify = FALSE,
fold_id = NULL,
par_details = getOption("gbm.parallel"),
is_verbose = FALSE
)
a GBMFit
object.
a data frame or data matrix containing the predictor variables.
is a matrix of outcomes. Excluding CoxPH this matrix of outcomes collapses to a vector; in the case of CoxPH it is a survival object where the event times fill the first one (or two columns) and the status fills the final column. The length of the 1st dimension of y must match the number of rows in x.
a GBMDist
object specifying the
distribution and any additional parameters needed.
optional vector of weights used in the fitting process. These weights must be positive but need not be normalized. By default they are set to 1 for each data row.
optional vector specifying the model offset; must be positive. This defaults to a vector of 0's, the length of which is equal to the rows of x.
a GBMTrainParams object which specifies the parameters used in growing decision trees.
a string specifying the name of the response - defaults to "y".
optional vector, the same length as the number of predictors, indicating the relationship each variable has with the outcome. It have a monotone increasing (+1) or decreasing (-1) or an arbitrary relationship.
a vector of strings of containing the names of the predictor variables.
a bool specifying whether or not the gbm_data object created in this method should be stored in the results.
a positive integer specifying the number of folds to be used in cross-validation of the gbm fit. If cv_folds > 1 then cross-validation is performed; the default of cv_folds is 1.
a bool specifying whether or not to stratify via response outcome. Currently only applies to "Bernoulli" distribution and defaults to false.
An optional vector of values identifying what fold each observation is in. If supplied, cv_folds can be missing. Note: Multiple rows of the same observation must have the same fold_id.
Details of the parallelization to use in the core algorithm.
if TRUE, gbmt will print out progress and performance of the fit.