GPModel
Estimates the parameters of a GPModel
by maximizing the marginal likelihood
fitGPModel(likelihood = "gaussian", group_data = NULL,
group_rand_coef_data = NULL, ind_effect_group_rand_coef = NULL,
drop_intercept_group_rand_effect = NULL, gp_coords = NULL,
gp_rand_coef_data = NULL, cov_function = "matern", cov_fct_shape = 1.5,
gp_approx = "none", num_parallel_threads = NULL,
cov_fct_taper_range = 1, cov_fct_taper_shape = 1, num_neighbors = NULL,
vecchia_ordering = "random", ind_points_selection = "kmeans++",
num_ind_points = NULL, cover_tree_radius = 1,
matrix_inversion_method = "cholesky", seed = 0L, cluster_ids = NULL,
free_raw_data = FALSE, y, X = NULL, params = list(),
vecchia_approx = NULL, vecchia_pred_type = NULL,
num_neighbors_pred = NULL, offset = NULL, fixed_effects = NULL,
likelihood_additional_param = NULL)
A fitted GPModel
A string
specifying the likelihood function (distribution) of the response variable.
Available options:
"gaussian"
"bernoulli_probit": binary data with Bernoulli likelihood and a probit link function
"bernoulli_logit": binary data with Bernoulli likelihood and a logit link function
"gamma": gamma distribution with a with log link function
"poisson": Poisson distribution with a with log link function
"negative_binomial": negative binomial distribution with a with log link function
"t": t-distribution (e.g., for robust regression)
"t_fix_df": t-distribution with the degrees-of-freedom (df) held fixed and not estimated.
The df can be set via the likelihood_additional_param
parameter
"gaussian_heteroscedastic": Gaussian likelihood where both the mean and the variance are related to fixed and random effects. This is currently only implemented for GPs with a 'vecchia' approximation
Note: other likelihoods could be implemented upon request
A vector
or matrix
whose columns are categorical grouping variables.
The elements being group levels defining grouped random effects.
The elements of 'group_data' can be integer, double, or character.
The number of columns corresponds to the number of grouped (intercept) random effects
A vector
or matrix
with numeric covariate data
for grouped random coefficients
A vector
with integer indices that
indicate the corresponding categorical grouping variable (=columns) in 'group_data' for
every covariate in 'group_rand_coef_data'. Counting starts at 1.
The length of this index vector must equal the number of covariates in 'group_rand_coef_data'.
For instance, c(1,1,2) means that the first two covariates (=first two columns) in 'group_rand_coef_data'
have random coefficients corresponding to the first categorical grouping variable (=first column) in 'group_data',
and the third covariate (=third column) in 'group_rand_coef_data' has a random coefficient
corresponding to the second grouping variable (=second column) in 'group_data'
A vector
of type logical
(boolean).
Indicates whether intercept random effects are dropped (only for random coefficients).
If drop_intercept_group_rand_effect[k] is TRUE, the intercept random effect number k is dropped / not included.
Only random effects with random slopes can be dropped.
A matrix
with numeric coordinates (= inputs / features) for defining Gaussian processes
A vector
or matrix
with numeric covariate data for
Gaussian process random coefficients
A string
specifying the covariance function for the Gaussian process.
Available options:
"matern": Matern covariance function with the smoothness specified by
the cov_fct_shape
parameter (using the parametrization of Rasmussen and Williams, 2006)
"matern_estimate_shape": same as "matern" but the smoothness parameter is also estimated
"matern_space_time": Spatio-temporal Matern covariance function with different range parameters for space and time.
Note that the first column in gp_coords
must correspond to the time dimension
"matern_ard": anisotropic Matern covariance function with Automatic Relevance Determination (ARD),
i.e., with a different range parameter for every coordinate dimension / column of gp_coords
"matern_ard_estimate_shape": same as "matern_ard" but the smoothness parameter is also estimated
"exponential": Exponential covariance function (using the parametrization of Diggle and Ribeiro, 2007)
"gaussian": Gaussian, aka squared exponential, covariance function (using the parametrization of Diggle and Ribeiro, 2007)
"gaussian_ard": anisotropic Gaussian, aka squared exponential, covariance function with Automatic Relevance Determination (ARD),
i.e., with a different range parameter for every coordinate dimension / column of gp_coords
"powered_exponential": powered exponential covariance function with the exponent specified by
the cov_fct_shape
parameter (using the parametrization of Diggle and Ribeiro, 2007)
"wendland": Compactly supported Wendland covariance function (using the parametrization of Bevilacqua et al., 2019, AOS)
A numeric
specifying the shape parameter of the covariance function
(e.g., smoothness parameter for Matern and Wendland covariance)
This parameter is irrelevant for some covariance functions such as the exponential or Gaussian
A string
specifying the large data approximation
for Gaussian processes. Available options:
"none": No approximation
"vecchia": Vecchia approximation; see Sigrist (2022, JMLR) for more details
"full_scale_vecchia": Vecchia-inducing points full-scale (VIF) approximation; see Gyger, Furrer, and Sigrist (2025) for more details
"tapering": The covariance function is multiplied by a compactly supported Wendland correlation function
"fitc": Fully Independent Training Conditional approximation aka modified predictive process approximation; see Gyger, Furrer, and Sigrist (2024) for more details
"full_scale_tapering": Full-scale approximation combining an inducing point / predictive process approximation with tapering on the residual process; see Gyger, Furrer, and Sigrist (2024) for more details
"vecchia_latent": similar as "vecchia" but a Vecchia approximation is applied to the latent Gaussian process for likelihood == "gaussian". For likelihood != "gaussian", "vecchia" and "vecchia_latent" are equivalent
An integer
specifying the number of parallel threads for OMP.
If num_parallel_threads = NULL, all available threads are used
A numeric
specifying the range parameter
of the Wendland covariance function and Wendland correlation taper function.
We follow the notation of Bevilacqua et al. (2019, AOS)
A numeric
specifying the shape (=smoothness) parameter
of the Wendland covariance function and Wendland correlation taper function.
We follow the notation of Bevilacqua et al. (2019, AOS)
An integer
specifying the number of neighbors for
the Vecchia and VIF approximations. Internal default values if NULL:
20 for gp_approx = "vecchia"
30 for gp_approx = "full_scale_vecchia"
Note: for prediction, the number of neighbors can be set through the 'num_neighbors_pred' parameter in the 'set_prediction_data' function. By default, num_neighbors_pred = 2 * num_neighbors. Further, the type of Vecchia approximation used for making predictions is set through the 'vecchia_pred_type' parameter in the 'set_prediction_data' function
A string
specifying the ordering used in
the Vecchia approximation. Available options:
"none": the default ordering in the data is used
"random": a random ordering
"time": ordering accorrding to time (only for space-time models)
"time_random_space": ordering according to time and randomly for all spatial points with the same time points (only for space-time models)
A string
specifying the method for choosing inducing points
Available options:
"kmeans++: the k-means++ algorithm
"cover_tree": the cover tree algorithm
"random": random selection from data points
An integer
specifying the number of inducing
points / knots for FITC, full_scale_tapering, and VIF approximations. Internal default values if NULL:
500 for gp_approx = "FITC" and gp_approx = "full_scale_tapering"
200 for gp_approx = "full_scale_vecchia"
A numeric
specifying the radius (= "spatial resolution")
for the cover tree algorithm
A string
specifying the method used for inverting covariance matrices.
Available options:
"cholesky": Cholesky factorization
"iterative": iterative methods. A combination of the conjugate gradient, the Lanczos algorithm, and other methods. This is currently only supported for the following cases:
grouped random effects with more than one level
likelihood != "gaussian" and gp_approx == "vecchia" (non-Gaussian likelihoods with a Vecchia-Laplace approximation)
likelihood != "gaussian" and gp_approx == "full_scale_vecchia" (non-Gaussian likelihoods with a VIFapproximation)
likelihood == "gaussian" and gp_approx == "full_scale_tapering" (Gaussian likelihood with a full-scale tapering approximation)
An integer
specifying the seed used for model creation
(e.g., random ordering in Vecchia approximation)
A vector
with elements indicating independent realizations of
random effects / Gaussian processes (same values = same process realization).
The elements of 'cluster_ids' can be integer, double, or character.
A boolean
. If TRUE, the data (groups, coordinates, covariate data for random coefficients)
is freed in R after initialization
A vector
with response variable data
A matrix
with numeric covariate data for the
fixed effects linear regression term (if there is one)
A list
with parameters for the estimation / optimization
optimizer_cov: string
(default = "lbfgs").
Optimizer used for estimating covariance parameters.
Options: "gradient_descent", "lbfgs", "fisher_scoring", "newton", "nelder_mead".
If there are additional auxiliary parameters for non-Gaussian likelihoods,
'optimizer_cov' is also used for those
optimizer_coef: string
(default = "wls" for Gaussian likelihoods and "lbfgs" for other likelihoods).
Optimizer used for estimating linear regression coefficients, if there are any
(for the GPBoost algorithm there are usually none).
Options: "gradient_descent", "lbfgs", "wls", "nelder_mead". Gradient descent steps are done simultaneously
with gradient descent steps for the covariance parameters.
"wls" refers to doing coordinate descent for the regression coefficients using weighted least squares.
If 'optimizer_cov' is set to "nelder_mead" or "lbfgs",
'optimizer_coef' is automatically also set to the same value.
maxit: integer
(default = 1000).
Maximal number of iterations for optimization algorithm
delta_rel_conv: numeric
(default = 1E-6 except for "nelder_mead" for which the default is 1E-8).
Convergence tolerance. The algorithm stops if the relative change
in either the (approximate) log-likelihood or the parameters is below this value.
If < 0, internal default values are used
convergence_criterion: string
(default = "relative_change_in_log_likelihood").
The convergence criterion used for terminating the optimization algorithm.
Options: "relative_change_in_log_likelihood" or "relative_change_in_parameters"
init_coef: vector
with numeric
elements (default = NULL).
Initial values for the regression coefficients (if there are any, can be NULL)
init_cov_pars: vector
with numeric
elements (default = NULL).
Initial values for covariance parameters of Gaussian process and
random effects (can be NULL). The order it the same as the order
of the parameters in the summary function: first is the error variance
(only for "gaussian" likelihood), next follow the variances of the
grouped random effects (if there are any, in the order provided in 'group_data'),
and then follow the marginal variance and the range of the Gaussian process.
If there are multiple Gaussian processes, then the variances and ranges follow alternatingly.
If 'init_cov_pars = NULL', an internal choice is used that depends on the
likelihood and the random effects type and covariance function.
If you select the option 'trace = TRUE' in the 'params' argument,
you will see the first initial covariance parameters in iteration 0.
lr_coef: numeric
(default = 0.1).
Learning rate for fixed effect regression coefficients if gradient descent is used
lr_cov: numeric
(default = 0.1 for "gradient_descent" and 1. otherwise).
Initial learning rate for covariance parameters if a gradient-based optimization method is used
If lr_cov < 0, internal default values are used (0.1 for "gradient_descent" and 1. otherwise)
If there are additional auxiliary parameters for non-Gaussian likelihoods, 'lr_cov' is also used for those
For "lbfgs", this is divided by the norm of the gradient in the first iteration
use_nesterov_acc: boolean
(default = TRUE).
If TRUE Nesterov acceleration is used.
This is used only for gradient descent
acc_rate_coef: numeric
(default = 0.5).
Acceleration rate for regression coefficients (if there are any)
for Nesterov acceleration
acc_rate_cov: numeric
(default = 0.5).
Acceleration rate for covariance parameters for Nesterov acceleration
momentum_offset: integer
(Default = 2).
Number of iterations for which no momentum is applied in the beginning.
trace: boolean
(default = FALSE).
If TRUE, information on the progress of the parameter
optimization is printed
std_dev: boolean
(default = TRUE).
If TRUE, approximate standard deviations are calculated for the covariance and linear regression parameters
(= square root of diagonal of the inverse Fisher information for Gaussian likelihoods and
square root of diagonal of a numerically approximated inverse Hessian for non-Gaussian likelihoods)
init_aux_pars: vector
with numeric
elements (default = NULL).
Initial values for additional parameters for non-Gaussian likelihoods
(e.g., shape parameter of a gamma or negative_binomial likelihood)
estimate_aux_pars: boolean
(default = TRUE).
If TRUE, additional parameters for non-Gaussian likelihoods
are also estimated (e.g., shape parameter of a gamma or negative_binomial likelihood)
cg_max_num_it: integer
(default = 1000).
Maximal number of iterations for conjugate gradient algorithms
cg_max_num_it_tridiag: integer
(default = 1000).
Maximal number of iterations for conjugate gradient algorithm
when being run as Lanczos algorithm for tridiagonalization
cg_delta_conv: numeric
(default = 1E-2).
Tolerance level for L2 norm of residuals for checking convergence
in conjugate gradient algorithm when being used for parameter estimation
num_rand_vec_trace: integer
(default = 50).
Number of random vectors (e.g., Rademacher) for stochastic approximation of the trace of a matrix
reuse_rand_vec_trace: boolean
(default = TRUE).
If true, random vectors (e.g., Rademacher) for stochastic approximations
of the trace of a matrix are sampled only once at the beginning of
the parameter estimation and reused in later trace approximations.
Otherwise they are sampled every time a trace is calculated
seed_rand_vec_trace: integer
(default = 1).
Seed number to generate random vectors (e.g., Rademacher)
cg_preconditioner_type (string
):
Type of preconditioner used for conjugate gradient algorithms.
Options for grouped random effects:
"ssor" (= default): SSOR preconditioner
"incomplete_cholesky": zero fill-in incomplete Cholesky factorization
Options for likelihood != "gaussian" and gp_approx == "vecchia" or likelihood == "gaussian" and gp_approx == "vecchia_latent":
"vadu" (= default): (B^T * (D^-1 + W) * B) as preconditioner for inverting (B^T * D^-1 * B + W), where B^T * D^-1 * B approx= Sigma^-1
"fitc": FITC / modified predictive process preconditioner for inverting (B^-1 * D * B^-T + W^-1)
"pivoted_cholesky": (Lk * Lk^T + W^-1) as preconditioner for inverting (B^-1 * D * B^-T + W^-1), where Lk is a low-rank pivoted Cholesky approximation for Sigma and B^-1 * D * B^-T approx= Sigma
"incomplete_cholesky": zero fill-in incomplete (reverse) Cholesky factorization of (B^T * D^-1 * B + W) using the sparsity pattern of B^T * D^-1 * B approx= Sigma^-1
Options for likelihood != "gaussian" and gp_approx == "full_scale_vecchia":
"fitc" ( = default): FITC / modified predictive process preconditioner
"vifdu": VIF with diagonal update preconditioner
Options for likelihood == "gaussian" and gp_approx == "full_scale_tapering":
"fitc" (= default): modified predictive process preconditioner
"none": no preconditioner
fitc_piv_chol_preconditioner_rank (integer
):
Rank of the FITC and pivoted Cholesky decomposition preconditioners for
iterative methods for Vecchia and VIF approximations
(for full_scale_tapering, the same inducing points as in the approximation as used).
Internal default values if NULL or < 0:
200 for the FITC preconditioner
50 for the pivoted Cholesky decomposition preconditioner
Discontinued. Use the argument gp_approx
instead
A string
specifying the type of Vecchia approximation used for making predictions.
This is discontinued here. Use the function 'set_prediction_data' to specify this
an integer
specifying the number of neighbors for making predictions.
This is discontinued here. Use the function 'set_prediction_data' to specify this
A numeric
vector
with
additional fixed effects contributions that are added to the linear predictor (= offset).
The length of this vector needs to equal the number of training data points.
This is discontinued. Use the renamed equivalent argument offset
instead
A numeric
specifying an additional parameter for the likelihood
which cannot be estimated for this likelihood
(e.g., degrees of freedom for likelihood = "t_fix_df"
).
This is not to be confused with any auxiliary parameters that can be estimated and accessed through
the function get_aux_pars
after estimation.
Note that this likelihood_additional_param
parameter is irrelevant for many likelihoods.
If likelihood_additional_param = NULL
, the following internal default values are used:
df = 2 for likelihood = "t_fix_df"
Fabio Sigrist
# See https://github.com/fabsig/GPBoost/tree/master/R-package for more examples
# \donttest{
data(GPBoost_data, package = "gpboost")
# Add intercept column
X1 <- cbind(rep(1,dim(X)[1]),X)
X_test1 <- cbind(rep(1,dim(X_test)[1]),X_test)
#--------------------Grouped random effects model: single-level random effect----------------
gp_model <- fitGPModel(group_data = group_data[,1], y = y, X = X1,
likelihood="gaussian", params = list(std_dev = TRUE))
summary(gp_model)
# Make predictions
pred <- predict(gp_model, group_data_pred = group_data_test[,1],
X_pred = X_test1, predict_var = TRUE)
pred$mu # Predicted mean
pred$var # Predicted variances
# Also predict covariance matrix
pred <- predict(gp_model, group_data_pred = group_data_test[,1],
X_pred = X_test1, predict_cov_mat = TRUE)
pred$mu # Predicted mean
pred$cov # Predicted covariance
#--------------------Two crossed random effects and a random slope----------------
gp_model <- fitGPModel(group_data = group_data, likelihood="gaussian",
group_rand_coef_data = X[,2],
ind_effect_group_rand_coef = 1,
y = y, X = X1, params = list(std_dev = TRUE))
summary(gp_model)
#--------------------Gaussian process model----------------
gp_model <- fitGPModel(gp_coords = coords, cov_function = "matern", cov_fct_shape = 1.5,
likelihood="gaussian", y = y, X = X1, params = list(std_dev = TRUE))
summary(gp_model)
# Make predictions
pred <- predict(gp_model, gp_coords_pred = coords_test,
X_pred = X_test1, predict_cov_mat = TRUE)
pred$mu # Predicted (posterior) mean of GP
pred$cov # Predicted (posterior) covariance matrix of GP
#--------------------Gaussian process model with Vecchia approximation----------------
gp_model <- fitGPModel(gp_coords = coords, cov_function = "matern", cov_fct_shape = 1.5,
gp_approx = "vecchia", num_neighbors = 20,
likelihood="gaussian", y = y)
summary(gp_model)
#--------------------Gaussian process model with random coefficients----------------
gp_model <- fitGPModel(gp_coords = coords, cov_function = "matern", cov_fct_shape = 1.5,
gp_rand_coef_data = X[,2], y=y,
likelihood = "gaussian", params = list(std_dev = TRUE))
summary(gp_model)
#--------------------Combine Gaussian process with grouped random effects----------------
gp_model <- fitGPModel(group_data = group_data,
gp_coords = coords, cov_function = "matern", cov_fct_shape = 1.5,
likelihood = "gaussian", y = y, X = X1, params = list(std_dev = TRUE))
summary(gp_model)
# }
Run the code above in your browser using DataLab