Learn R Programming

⚠️There's a newer version (1.5.6) of this package.Take me there.

GPBoost R Package

This is the R package implementation of the GPBoost library. See https://github.com/fabsig/GPBoost for more information on the modeling background and the software implementation.

Table of Contents

Examples

This is also a short example:

# Combine tree-boosting and grouped random effects model
library(gpboost)
data(GPBoost_data, package = "gpboost")
gp_model <- GPModel(group_data = group_data)
bst <- gpboost(data = X, label = y, gp_model = gp_model,
               nrounds = 10, objective = "regression_l2")
summary(gp_model)
pred <- predict(bst, data = X_test, group_data_pred = group_data_test)
pred$response_mean

Installation

Installation from CRAN

The gpboost package is available on CRAN and can be installed as follows:

install.packages("gpboost", repos = "https://cran.r-project.org")

Installation from source

It is much easier to install the package from CRAN. However, the package can also be build from source as described in the following. In short, the main steps for installation are the following ones:

  • Install git
  • Install CMake
  • Install Rtools (for Windows only). Choose the option 'add rtools to system PATH'.
  • Make sure that you have an appropriate C++ compiler (see below for more details). E.g. for Windows, simply download the free Visual Studio Community Edition and do not forget to select 'Desktop development with C++' when installing it
  • Install the GPBoost package from the command line using:
git clone --recursive https://github.com/fabsig/GPBoost
cd GPBoost
Rscript build_r.R

Below is a more complete installation guide.

Preparation

You need to install git and CMake first. Note that 32-bit R/Rtools is not supported for custom installation.

Windows Preparation

NOTE: Windows users may need to run with administrator rights (either R or the command prompt, depending on the way you are installing this package).

Installing a 64-bit version of Rtools is mandatory.

After installing Rtools and CMake, be sure the following paths are added to the environment variable PATH. These may have been automatically added when installing other software.

  • Rtools
    • If you have Rtools 3.x, example:
      • C:\Rtools\mingw_64\bin
    • If you have Rtools 4.x, example (NOTE: two paths are required):
      • C:\rtools40\mingw64\bin
      • C:\rtools40\usr\bin
      • For instance, when installing in R with install.packages(), these paths can be added locally in R as follows prior to installation:
Sys.setenv(PATH=paste0(Sys.getenv("PATH"),";C:\\Rtools\\mingw_64\\bin\\;C:\\rtools40\\usr\\bin\\"))
  • CMake
    • example: C:\Program Files\CMake\bin
  • R
    • example: C:\Program Files\R\R-3.6.1\bin

The default compiler is Visual Studio (or VS Build Tools) in Windows, with an automatic fallback to MingGW64 (i.e. it is enough to only have Rtools and CMake). To force the usage of MinGW64, you can add the --use-mingw (for R 3.x) or --use-msys2 (for R 4.x) flags (see below).

Mac OS Preparation

You can perform installation either with Apple Clang or gcc.

  • In case you prefer Apple Clang, you should install OpenMP (details for installation can be found in the Installation Guide) first and CMake version 3.12 or higher is required. Only Apple Clang version 8.1 or higher is supported.
  • In case you prefer gcc, you need to install it (details for installation can be found in the Installation Guide) and set some environment variables to tell R to use gcc and g++. If you install these from Homebrew, your versions of g++ and gcc are most likely in /usr/local/bin, as shown below.
# replace 8 with version of gcc installed on your machine
export CXX=/usr/local/bin/g++-8 CC=/usr/local/bin/gcc-8

Install

Build and install the R package with the following commands:

git clone --recursive https://github.com/fabsig/GPBoost
cd GPBoost
Rscript build_r.R

The build_r.R script builds the package in a temporary directory called gpboost_r. It will destroy and recreate that directory each time you run the script. That script supports the following command-line options:

  • --skip-install: Build the package tarball, but do not install it.
  • --use-gpu: Build a GPU-enabled version of the library.
  • --use-mingw: Force the use of MinGW toolchain, regardless of R version.
  • --use-msys2: Force the use of MSYS2 toolchain, regardless of R version.

Note: for the build with Visual Studio/VS Build Tools in Windows, you should use the Windows CMD or PowerShell.

Testing

There is currently no integration service set up that automatically runs unit tests. However, any contribution needs to pass all unit tests in the R-package/tests/testthat directory. These tests can be run using the run_tests_coverage_R_package.R file. In any case, make sure that you run the full set of tests by speciying the following environment variable

Sys.setenv(GPBOOST_ALL_TESTS = "GPBOOST_ALL_TESTS")

before runing the tests in the R-package/tests/testthat directory.

Copy Link

Version

Install

install.packages('gpboost')

Monthly Downloads

572

Version

0.7.9

License

Apache License (== 2.0) | file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Fabio Sigrist

Last Published

August 25th, 2022

Functions in gpboost (0.7.9)

gpb.Dataset.construct

Construct Dataset explicitly
fitGPModel

Fits a GPModel
getinfo

Get information of an gpb.Dataset object
gpb.Dataset.create.valid

Construct validation data
dimnames.gpb.Dataset

Handling of column names of gpb.Dataset
fit

Generic 'fit' method for a GPModel
dim.gpb.Dataset

Dimensions of an gpb.Dataset
gpb.Dataset

Construct gpb.Dataset object
get_nested_categories

Auxiliary function to create categorical variables for nested grouped random effects
fit.GPModel

Fits a GPModel
gpb.cv

CV function for number of boosting iterations
gpb.interprete

Compute feature contribution of prediction
gpb.get.eval.result

Get record evaluation result from booster
gpb.dump

Dump GPBoost model to json
gpb.Dataset.save

Save gpb.Dataset to a binary file
gpb.convert_with_rules

Data preparator for GPBoost datasets with rules (integer)
gpb.Dataset.set.categorical

Set categorical feature of gpb.Dataset
gpb.importance

Compute feature importance in a model
gpb.Dataset.set.reference

Set reference of gpb.Dataset
gpb.grid.search.tune.parameters

Function for choosing tuning parameters
gpb.model.dt.tree

Parse a GPBoost model json dump
gpb.load

Load GPBoost model
gpb.plot.importance

Plot feature importance as a bar graph
gpb_shared_params

Shared parameter docs
gpb.plot.part.dep.interact

Plot interaction partial dependence plots
gpb.plot.partial.dependence

Plot partial dependence plots
gpb.save

Save GPBoost model
gpb.plot.interpretation

Plot feature contribution as a bar graph
gpboost

Train a GPBoost model
gpb.train

Main training logic for GBPoost
loadGPModel

Load a GPModel from a file
predict_training_data_random_effects

Generic 'predict_training_data_random_effects' method for a GPModel
group_data_test

Example data for the GPBoost package
group_data

Example data for the GPBoost package
predict.GPModel

Make predictions for a GPModel
saveRDS.gpb.Booster

saveRDS for gpb.Booster models
saveGPModel

Save a GPModel
predict_training_data_random_effects.GPModel

Predict ("estimate") training data random effects for a GPModel
readRDS.gpb.Booster

readRDS for gpb.Booster models
predict.gpb.Booster

Prediction function for gpb.Booster objects
setinfo

Set information of an gpb.Dataset object
set_prediction_data.GPModel

Set prediction data for a GPModel
set_prediction_data

Generic 'set_prediction_data' method for a GPModel
summary.GPModel

Summary for a GPModel
y

Example data for the GPBoost package
slice

Slice a dataset
GPModel_shared_params

Shared parameter docs
agaricus.train

Training part from Mushroom Data Set
GPModel

Create a GPModel object
X

Example data for the GPBoost package
X_test

Example data for the GPBoost package
agaricus.test

Test part from Mushroom Data Set
bank

Bank Marketing Data Set
coords

Example data for the GPBoost package
GPBoost_data

Example data for the GPBoost package
coords_test

Example data for the GPBoost package