predict.gpb.Booster: Predict method for GPBoost model

Description

Predicted values based on class gpb.Booster

Usage

# S3 method for gpb.Booster
predict(object, data, start_iteration = NULL,
  num_iteration = NULL, rawscore = FALSE, predleaf = FALSE,
  predcontrib = FALSE, header = FALSE, reshape = FALSE, ...)

Arguments

object

Object of class gpb.Booster

data

a matrix object, a dgCMatrix object or a character representing a filename

start_iteration

int or None, optional (default=None) Start index of the iteration to predict. If None or <= 0, starts from the first iteration.

num_iteration

int or None, optional (default=None) Limit number of iterations in the prediction. If None, if the best iteration exists and start_iteration is None or <= 0, the best iteration is used; otherwise, all iterations from start_iteration are used. If <= 0, all iterations from start_iteration are used (no limits).

rawscore

whether the prediction should be returned in the for of original untransformed sum of predictions from boosting iterations' results. E.g., setting rawscore=TRUE for logistic regression would result in predictions for log-odds instead of probabilities.

predleaf

whether predict leaf index instead.

predcontrib

return per-feature contributions for each record.

header

only used for prediction for text file. True if text file has header

reshape

whether to reshape the vector of predictions to a matrix form when there are several prediction outputs per case.

...

Additional named arguments passed to the predict() method of the gpb.Booster object passed to object. In particular, this includes prediction data for the GPModel (if there is one) Set predict_var or predict_cov_mat to TRUE, if you want to obtain predicted variances or covariances

Value

For regression or binary classification, it returns a vector of length nrows(data). For multiclass classification, either a num_class * nrows(data) vector or a (nrows(data), num_class) dimension matrix is returned, depending on the reshape value.

When predleaf = TRUE, the output is a matrix object with the number of columns corresponding to the number of trees.

Examples

Run this code

# NOT RUN {
# See https://github.com/fabsig/GPBoost/tree/master/R-package for more examples

library(gpboost)
data(GPBoost_data, package = "gpboost")

#--------------------Combine tree-boosting and grouped random effects model----------------
# Create random effects model
gp_model <- GPModel(group_data = group_data[,1], likelihood = "gaussian")
# The default optimizer for covariance parameters for Gaussian data is Fisher scoring.
# For non-Gaussian data, gradient descent is used.
# Optimizer properties can be changed as follows:
# re_params <- list(optimizer_cov = "gradient_descent", use_nesterov_acc = TRUE)
# gp_model$set_optim_params(params=re_params)
# Use trace = TRUE to monitor convergence:
# re_params <- list(trace = TRUE)
# gp_model$set_optim_params(params=re_params)

# Train model
bst <- gpboost(data = X,
               label = y,
               gp_model = gp_model,
               nrounds = 16,
               learning_rate = 0.05,
               max_depth = 6,
               min_data_in_leaf = 5,
               objective = "regression_l2",
               verbose = 0)
# Estimated random effects model
summary(gp_model)

# Make predictions
pred <- predict(bst, data = X_test, group_data_pred = group_data_test[,1],
                predict_var= TRUE)
pred$random_effect_mean # Predicted mean
pred$random_effect_cov # Predicted variances
pred$fixed_effect # Predicted fixed effect from tree ensemble
# Sum them up to otbain a single prediction
pred$random_effect_mean + pred$fixed_effect

# }
# NOT RUN {
#--------------------Combine tree-boosting and Gaussian process model----------------
# Create Gaussian process model
gp_model <- GPModel(gp_coords = coords, cov_function = "exponential",
                    likelihood = "gaussian")
# Train model
bst <- gpboost(data = X,
               label = y,
               gp_model = gp_model,
               nrounds = 8,
               learning_rate = 0.1,
               max_depth = 6,
               min_data_in_leaf = 5,
               objective = "regression_l2",
               verbose = 0)
# Estimated random effects model
summary(gp_model)
# Make predictions
pred <- predict(bst, data = X_test, gp_coords_pred = coords_test,
                predict_cov_mat =TRUE)
pred$random_effect_mean # Predicted (posterior) mean of GP
pred$random_effect_cov # Predicted (posterior) covariance matrix of GP
pred$fixed_effect # Predicted fixed effect from tree ensemble
# Sum them up to otbain a single prediction
pred$random_effect_mean + pred$fixed_effect
# }

Run the code above in your browser using DataLab