GPModel objectCreate a GPModel which contains a Gaussian process and / or mixed effects model with grouped random effects
GPModel(likelihood = "gaussian", group_data = NULL,
group_rand_coef_data = NULL, ind_effect_group_rand_coef = NULL,
drop_intercept_group_rand_effect = NULL, gp_coords = NULL,
gp_rand_coef_data = NULL, cov_function = "exponential",
cov_fct_shape = 0.5, gp_approx = "none", cov_fct_taper_range = 1,
cov_fct_taper_shape = 0, num_neighbors = 20L,
vecchia_ordering = "random", num_ind_points = 500L,
matrix_inversion_method = "cholesky", seed = 0L, cluster_ids = NULL,
free_raw_data = FALSE, vecchia_approx = NULL, vecchia_pred_type = NULL,
num_neighbors_pred = NULL)A GPModel containing ontains a Gaussian process and / or mixed effects model with grouped random effects
A string specifying the likelihood function (distribution) of the response variable.
Available options:
"gaussian"
"bernoulli_probit": binary data with Bernoulli likelihood and a probit link function
"bernoulli_logit": binary data with Bernoulli likelihood and a logit link function
"gamma"
"poisson"
A vector or matrix whose columns are categorical grouping variables.
The elements being group levels defining grouped random effects.
The elements of 'group_data' can be integer, double, or character.
The number of columns corresponds to the number of grouped (intercept) random effects
A vector or matrix with numeric covariate data
for grouped random coefficients
A vector with integer indices that
indicate the corresponding categorical grouping variable (=columns) in 'group_data' for
every covariate in 'group_rand_coef_data'. Counting starts at 1.
The length of this index vector must equal the number of covariates in 'group_rand_coef_data'.
For instance, c(1,1,2) means that the first two covariates (=first two columns) in 'group_rand_coef_data'
have random coefficients corresponding to the first categorical grouping variable (=first column) in 'group_data',
and the third covariate (=third column) in 'group_rand_coef_data' has a random coefficient
corresponding to the second grouping variable (=second column) in 'group_data'
A vector of type logical (boolean).
Indicates whether intercept random effects are dropped (only for random coefficients).
If drop_intercept_group_rand_effect[k] is TRUE, the intercept random effect number k is dropped / not included.
Only random effects with random slopes can be dropped.
A matrix with numeric coordinates (= inputs / features) for defining Gaussian processes
A vector or matrix with numeric covariate data for
Gaussian process random coefficients
A string specifying the covariance function for the Gaussian process.
Available options:
"exponential", "gaussian", "matern", "powered_exponential", "wendland"
For "exponential", "gaussian", and "powered_exponential", we use parametrization of Diggle and Ribeiro (2007)
For "matern", we use the parametrization of Rasmussen and Williams (2006)
For "wendland", we use the parametrization of Bevilacqua et al. (2019, AOS)
A numeric specifying the shape parameter of the covariance function
(=smoothness parameter for Matern covariance)
This parameter is irrelevant for some covariance functions such as the exponential or Gaussian
A string specifying the large data approximation
for Gaussian processes. Available options:
"none": No approximation
"vecchia": A Vecchia approximation; see Sigrist (2022, JMLR for more details)
"tapering": The covariance function is multiplied by a compactly supported Wendland correlation function
A numeric specifying the range parameter
of the Wendland covariance function and Wendland correlation taper function.
We follow the notation of Bevilacqua et al. (2019, AOS)
A numeric specifying the shape (=smoothness) parameter
of the Wendland covariance function and Wendland correlation taper function.
We follow the notation of Bevilacqua et al. (2019, AOS)
An integer specifying the number of neighbors for
the Vecchia approximation. Note: for prediction, the number of neighbors can
be set through the 'num_neighbors_pred' parameter in the 'set_prediction_data'
function. By default, num_neighbors_pred = 2 * num_neighbors. Further,
the type of Vecchia approximation used for making predictions is set through
the 'vecchia_pred_type' parameter in the 'set_prediction_data' function
A string specifying the ordering used in
the Vecchia approximation. Available options:
"none": the default ordering in the data is used
"random": a random ordering
An integer specifying the number of inducing
points / knots for, e.g., a predictive process approximation
A string specifying the method used for inverting covariance matrices.
Available options:
"cholesky": Cholesky factorization
"iterative": iterative methods. Only supported for non-Gaussian likelihoods with a Vecchia-Laplace approximation. This a combination of conjugate gradient, Lanczos algorithm, and other methods
An integer specifying the seed used for model creation
(e.g., random ordering in Vecchia approximation)
A vector with elements indicating independent realizations of
random effects / Gaussian processes (same values = same process realization).
The elements of 'cluster_ids' can be integer, double, or character.
A boolean. If TRUE, the data (groups, coordinates, covariate data for random coefficients)
is freed in R after initialization
Discontinued. Use the argument gp_approx instead
A string specifying the type of Vecchia approximation used for making predictions.
This is discontinued here. Use the function 'set_prediction_data' to specify this
an integer specifying the number of neighbors for making predictions.
This is discontinued here. Use the function 'set_prediction_data' to specify this
Fabio Sigrist
# See https://github.com/fabsig/GPBoost/tree/master/R-package for more examples
data(GPBoost_data, package = "gpboost")
#--------------------Grouped random effects model: single-level random effect----------------
gp_model <- GPModel(group_data = group_data[,1], likelihood="gaussian")
#--------------------Gaussian process model----------------
gp_model <- GPModel(gp_coords = coords, cov_function = "exponential",
likelihood="gaussian")
#--------------------Combine Gaussian process with grouped random effects----------------
gp_model <- GPModel(group_data = group_data,
gp_coords = coords, cov_function = "exponential",
likelihood="gaussian")
Run the code above in your browser using DataLab