GPModel_shared_params: Documentation for parameters shared by `GPModel`, `gpb.cv`, and `gpboost`

Description

Documentation for parameters shared by GPModel, gpb.cv, and gpboost

Arguments

likelihood

A string specifying the likelihood function (distribution) of the response variable. Available options:

"gaussian"
"bernoulli_probit": binary data with Bernoulli likelihood and a probit link function
"bernoulli_logit": binary data with Bernoulli likelihood and a logit link function
"gamma": gamma distribution with a with log link function
"poisson": Poisson distribution with a with log link function
"negative_binomial": negative binomial distribution with a with log link function
"t": t-distribution (e.g., for robust regression)
"t_fix_df": t-distribution with the degrees-of-freedom (df) held fixed and not estimated. The df can be set via the likelihood_additional_param parameter
Note: other likelihoods could be implemented upon request

likelihood_additional_param

A numeric specifying an additional parameter for the likelihood which cannot be estimated for this likelihood (e.g., degrees of freedom for likelihood = "t_fix_df"). This is not to be confused with any auxiliary parameters that can be estimated and accessed through the function get_aux_pars after estimation. Note that this likelihood_additional_param parameter is irrelevant for many likelihoods. If likelihood_additional_param = NULL, the following internal default values are used:

df = 2 for likelihood = "t_fix_df"

group_data

A vector or matrix whose columns are categorical grouping variables. The elements being group levels defining grouped random effects. The elements of 'group_data' can be integer, double, or character. The number of columns corresponds to the number of grouped (intercept) random effects

group_rand_coef_data

A vector or matrix with numeric covariate data for grouped random coefficients

ind_effect_group_rand_coef

A vector with integer indices that indicate the corresponding categorical grouping variable (=columns) in 'group_data' for every covariate in 'group_rand_coef_data'. Counting starts at 1. The length of this index vector must equal the number of covariates in 'group_rand_coef_data'. For instance, c(1,1,2) means that the first two covariates (=first two columns) in 'group_rand_coef_data' have random coefficients corresponding to the first categorical grouping variable (=first column) in 'group_data', and the third covariate (=third column) in 'group_rand_coef_data' has a random coefficient corresponding to the second grouping variable (=second column) in 'group_data'

drop_intercept_group_rand_effect

A vector of type logical (boolean). Indicates whether intercept random effects are dropped (only for random coefficients). If drop_intercept_group_rand_effect[k] is TRUE, the intercept random effect number k is dropped / not included. Only random effects with random slopes can be dropped.

gp_coords

A matrix with numeric coordinates (= inputs / features) for defining Gaussian processes

gp_rand_coef_data

A vector or matrix with numeric covariate data for Gaussian process random coefficients

cov_function

A string specifying the covariance function for the Gaussian process. Available options:

"exponential": Exponential covariance function (using the parametrization of Diggle and Ribeiro, 2007)
"gaussian": Gaussian, aka squared exponential, covariance function (using the parametrization of Diggle and Ribeiro, 2007)
"matern": Matern covariance function with the smoothness specified by the cov_fct_shape parameter (using the parametrization of Rasmussen and Williams, 2006)
"powered_exponential": powered exponential covariance function with the exponent specified by the cov_fct_shape parameter (using the parametrization of Diggle and Ribeiro, 2007)
"wendland": Compactly supported Wendland covariance function (using the parametrization of Bevilacqua et al., 2019, AOS)
"matern_space_time": Spatio-temporal Matern covariance function with different range parameters for space and time. Note that the first column in gp_coords must correspond to the time dimension
"matern_ard": anisotropic Matern covariance function with Automatic Relevance Determination (ARD), i.e., with a different range parameter for every coordinate dimension / column of gp_coords
"gaussian_ard": anisotropic Gaussian, aka squared exponential, covariance function with Automatic Relevance Determination (ARD), i.e., with a different range parameter for every coordinate dimension / column of gp_coords

cov_fct_shape

A numeric specifying the shape parameter of the covariance function (=smoothness parameter for Matern covariance) This parameter is irrelevant for some covariance functions such as the exponential or Gaussian

gp_approx

A string specifying the large data approximation for Gaussian processes. Available options:

"none": No approximation
"vecchia": A Vecchia approximation; see Sigrist (2022, JMLR) for more details
"tapering": The covariance function is multiplied by a compactly supported Wendland correlation function
"fitc": Fully Independent Training Conditional approximation aka modified predictive process approximation; see Gyger, Furrer, and Sigrist (2024) for more details
"full_scale_tapering": A full scale approximation combining an inducing point / predictive process approximation with tapering on the residual process; see Gyger, Furrer, and Sigrist (2024) for more details
"vecchia_latent": similar as "vecchia" but a Vecchia approximation is applied to the latent Gaussian process for likelihood == "gaussian". For likelihood != "gaussian", "vecchia" and "vecchia_latent" are equivalent

num_parallel_threads

An integer specifying the number of parallel threads for OMP. If num_parallel_threads = NULL, all available threads are used

cov_fct_taper_range

A numeric specifying the range parameter of the Wendland covariance function and Wendland correlation taper function. We follow the notation of Bevilacqua et al. (2019, AOS)

cov_fct_taper_shape

A numeric specifying the shape (=smoothness) parameter of the Wendland covariance function and Wendland correlation taper function. We follow the notation of Bevilacqua et al. (2019, AOS)

num_neighbors

An integer specifying the number of neighbors for the Vecchia approximation. Note: for prediction, the number of neighbors can be set through the 'num_neighbors_pred' parameter in the 'set_prediction_data' function. By default, num_neighbors_pred = 2 * num_neighbors. Further, the type of Vecchia approximation used for making predictions is set through the 'vecchia_pred_type' parameter in the 'set_prediction_data' function

vecchia_ordering

A string specifying the ordering used in the Vecchia approximation. Available options:

"none": the default ordering in the data is used
"random": a random ordering
"time": ordering accorrding to time (only for space-time models)
"time_random_space": ordering according to time and randomly for all spatial points with the same time points (only for space-time models)

ind_points_selection

A string specifying the method for choosing inducing points Available options:

"kmeans++: the k-means++ algorithm
"cover_tree": the cover tree algorithm
"random": random selection from data points

num_ind_points

An integer specifying the number of inducing points / knots for, e.g., a predictive process approximation

cover_tree_radius

A numeric specifying the radius (= "spatial resolution") for the cover tree algorithm

matrix_inversion_method

A string specifying the method used for inverting covariance matrices. Available options:

"cholesky": Cholesky factorization
"iterative": iterative methods. A combination of conjugate gradient, Lanczos algorithm, and other methods. This is currently only supported for the following cases:
- likelihood != "gaussian" and gp_approx == "vecchia" (non-Gaussian likelihoods with a Vecchia-Laplace approximation)
- likelihood == "gaussian" and gp_approx == "full_scale_tapering" (Gaussian likelihood with a full-scale tapering approximation)

seed

An integer specifying the seed used for model creation (e.g., random ordering in Vecchia approximation)

vecchia_pred_type

A string specifying the type of Vecchia approximation used for making predictions. Default value if vecchia_pred_type = NULL: "order_obs_first_cond_obs_only". Available options:

"order_obs_first_cond_obs_only": Vecchia approximation for the observable process and observed training data is ordered first and the neighbors are only observed training data points
"order_obs_first_cond_all": Vecchia approximation for the observable process and observed training data is ordered first and the neighbors are selected among all points (training + prediction)
"latent_order_obs_first_cond_obs_only": Vecchia approximation for the latent process and observed data is ordered first and neighbors are only observed points
"latent_order_obs_first_cond_all": Vecchia approximation for the latent process and observed data is ordered first and neighbors are selected among all points
"order_pred_first": Vecchia approximation for the observable process and prediction data is ordered first for making predictions. This option is only available for Gaussian likelihoods

num_neighbors_pred

an integer specifying the number of neighbors for the Vecchia approximation for making predictions. Default value if NULL: num_neighbors_pred = 2 * num_neighbors

cg_delta_conv_pred

a numeric specifying the tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithms when being used for prediction Default value if NULL: 1e-3

nsim_var_pred

an integer specifying the number of samples when simulation is used for calculating predictive variances Default value if NULL: 1000

rank_pred_approx_matrix_lanczos

an integer specifying the rank of the matrix for approximating predictive covariances obtained using the Lanczos algorithm Default value if NULL: 1000

cluster_ids

A vector with elements indicating independent realizations of random effects / Gaussian processes (same values = same process realization). The elements of 'cluster_ids' can be integer, double, or character.

free_raw_data

A boolean. If TRUE, the data (groups, coordinates, covariate data for random coefficients) is freed in R after initialization

y

A vector with response variable data

X

A matrix with numeric covariate data for the fixed effects linear regression term (if there is one)

params

A list with parameters for the estimation / optimization

optimizer_cov: string (default = "lbfgs"). Optimizer used for estimating covariance parameters. Options: "gradient_descent", "lbfgs", "fisher_scoring", "newton", "nelder_mead". If there are additional auxiliary parameters for non-Gaussian likelihoods, 'optimizer_cov' is also used for those
optimizer_coef: string (default = "wls" for Gaussian likelihoods and "lbfgs" for other likelihoods). Optimizer used for estimating linear regression coefficients, if there are any (for the GPBoost algorithm there are usually none). Options: "gradient_descent", "lbfgs", "wls", "nelder_mead". Gradient descent steps are done simultaneously with gradient descent steps for the covariance parameters. "wls" refers to doing coordinate descent for the regression coefficients using weighted least squares. If 'optimizer_cov' is set to "nelder_mead" or "lbfgs", 'optimizer_coef' is automatically also set to the same value.
maxit: integer (default = 1000). Maximal number of iterations for optimization algorithm
delta_rel_conv: numeric (default = 1E-6 except for "nelder_mead" for which the default is 1E-8). Convergence tolerance. The algorithm stops if the relative change in either the (approximate) log-likelihood or the parameters is below this value. If < 0, internal default values are used
convergence_criterion: string (default = "relative_change_in_log_likelihood"). The convergence criterion used for terminating the optimization algorithm. Options: "relative_change_in_log_likelihood" or "relative_change_in_parameters"
init_coef: vector with numeric elements (default = NULL). Initial values for the regression coefficients (if there are any, can be NULL)
init_cov_pars: vector with numeric elements (default = NULL). Initial values for covariance parameters of Gaussian process and random effects (can be NULL). The order it the same as the order of the parameters in the summary function: first is the error variance (only for "gaussian" likelihood), next follow the variances of the grouped random effects (if there are any, in the order provided in 'group_data'), and then follow the marginal variance and the range of the Gaussian process. If there are multiple Gaussian processes, then the variances and ranges follow alternatingly. If 'init_cov_pars = NULL', an internal choice is used that depends on the likelihood and the random effects type and covariance function. If you select the option 'trace = TRUE' in the 'params' argument, you will see the first initial covariance parameters in iteration 0.
lr_coef: numeric (default = 0.1). Learning rate for fixed effect regression coefficients if gradient descent is used
lr_cov: numeric (default = 0.1 for "gradient_descent" and 1. otherwise). Initial learning rate for covariance parameters if a gradient-based optimization method is used
- If lr_cov < 0, internal default values are used (0.1 for "gradient_descent" and 1. otherwise)
- If there are additional auxiliary parameters for non-Gaussian likelihoods, 'lr_cov' is also used for those
- For "lbfgs", this is divided by the norm of the gradient in the first iteration
use_nesterov_acc: boolean (default = TRUE). If TRUE Nesterov acceleration is used. This is used only for gradient descent
acc_rate_coef: numeric (default = 0.5). Acceleration rate for regression coefficients (if there are any) for Nesterov acceleration
acc_rate_cov: numeric (default = 0.5). Acceleration rate for covariance parameters for Nesterov acceleration
momentum_offset: integer (Default = 2). Number of iterations for which no momentum is applied in the beginning.
trace: boolean (default = FALSE). If TRUE, information on the progress of the parameter optimization is printed
std_dev: boolean (default = TRUE). If TRUE, approximate standard deviations are calculated for the covariance and linear regression parameters (= square root of diagonal of the inverse Fisher information for Gaussian likelihoods and square root of diagonal of a numerically approximated inverse Hessian for non-Gaussian likelihoods)
init_aux_pars: vector with numeric elements (default = NULL). Initial values for additional parameters for non-Gaussian likelihoods (e.g., shape parameter of a gamma or negative_binomial likelihood)
estimate_aux_pars: boolean (default = TRUE). If TRUE, additional parameters for non-Gaussian likelihoods are also estimated (e.g., shape parameter of a gamma or negative_binomial likelihood)
cg_max_num_it: integer (default = 1000). Maximal number of iterations for conjugate gradient algorithms
cg_max_num_it_tridiag: integer (default = 1000). Maximal number of iterations for conjugate gradient algorithm when being run as Lanczos algorithm for tridiagonalization
cg_delta_conv: numeric (default = 1E-2). Tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithm when being used for parameter estimation
num_rand_vec_trace: integer (default = 50). Number of random vectors (e.g., Rademacher) for stochastic approximation of the trace of a matrix
reuse_rand_vec_trace: boolean (default = TRUE). If true, random vectors (e.g., Rademacher) for stochastic approximations of the trace of a matrix are sampled only once at the beginning of the parameter estimation and reused in later trace approximations. Otherwise they are sampled every time a trace is calculated
seed_rand_vec_trace: integer (default = 1). Seed number to generate random vectors (e.g., Rademacher)
piv_chol_rank: integer (default = 50). Rank of the pivoted Cholesky decomposition used as preconditioner in conjugate gradient algorithms
cg_preconditioner_type: string. Type of preconditioner used for conjugate gradient algorithms.
- Options for likelihood != "gaussian" and gp_approx == "vecchia" or likelihood == "gaussian" and gp_approx == "vecchia_latent":
  - "vadu" (= default): (B^T * (D^-1 + W) * B) as preconditioner for inverting (B^T * D^-1 * B + W), where B^T * D^-1 * B approx= Sigma^-1
  - "fitc": modified predictive process preconditioner for inverting (B^-1 * D * B^-T + W^-1)
  - "pivoted_cholesky": (Lk * Lk^T + W^-1) as preconditioner for inverting (B^-1 * D * B^-T + W^-1), where Lk is a low-rank pivoted Cholesky approximation for Sigma and B^-1 * D * B^-T approx= Sigma
  - "incomplete_cholesky": zero fill-in incomplete (reverse) Cholesky factorization of (B^T * D^-1 * B + W) using the sparsity pattern of B^T * D^-1 * B approx= Sigma^-1
- Options for likelihood == "gaussian" and gp_approx == "full_scale_tapering":
  - "fitc" (= default): modified predictive process preconditioner
  - "none": no preconditioner

offset

A numeric vector with additional fixed effects contributions that are added to the linear predictor (= offset). The length of this vector needs to equal the number of training data points.

fixed_effects

This is discontinued. Use the renamed equivalent argument offset instead

group_data_pred

A vector or matrix with elements being group levels for which predictions are made (if there are grouped random effects in the GPModel)

group_rand_coef_data_pred

A vector or matrix with covariate data for grouped random coefficients (if there are some in the GPModel)

gp_coords_pred

A matrix with prediction coordinates (=features) for Gaussian process (if there is a GP in the GPModel)

gp_rand_coef_data_pred

A vector or matrix with covariate data for Gaussian process random coefficients (if there are some in the GPModel)

cluster_ids_pred

A vector with elements indicating the realizations of random effects / Gaussian processes for which predictions are made (set to NULL if you have not specified this when creating the GPModel)

X_pred

A matrix with prediction covariate data for the fixed effects linear regression term (if there is one in the GPModel)

predict_cov_mat

A boolean. If TRUE, the (posterior) predictive covariance is calculated in addition to the (posterior) predictive mean

predict_var

A boolean. If TRUE, the (posterior) predictive variances are calculated

vecchia_approx

Discontinued. Use the argument gp_approx instead