Fit: Fit a GaSP model.

Description

Fit (train) a GaSP model.

Usage

Fit(
  x,
  y,
  reg_model,
  sp_model = NULL,
  cor_family = c("PowerExponential", "Matern"),
  cor_par = data.frame(0),
  random_error = c(FALSE, TRUE),
  sp_var = -1,
  error_var = -1,
  nugget = 1e-09,
  tries = 10,
  seed = 500,
  fit_objective = c("Likelihood", "Posterior"),
  theta_standardized_min = 0,
  theta_standardized_max = .Machine$double.xmax,
  alpha_min = 0,
  alpha_max = 1,
  derivatives_min = 0,
  derivatives_max = 3,
  log_obj_tol = 1e-05,
  log_obj_diff = 0,
  lambda_prior = 0.1,
  model_comparison = c("Objective", "CV")
)

Value

A GaSPModel object, which is a list with the following components:

x: The data frame containing the input training data.
y: The training output data, now as a vector.
reg_model: The regression model, now in the form of a data frame.
sp_model: The stochastic process model, now in the form of a data frame.
cor_family: The correlation family.
cor_par: A data frame for the estimated correlation parameters.
random_error: The boolean for the presence or not of a random error term.
sp_var: The estimated stochastic process variance.
error_var: The estimated random error variance.
beta: A data frame holding the estimated regression-model parameters.
objective: The maximum value found for the objective function: the log likelihood (fit_objective = "Likelihood") or the log posterior (fit_objective = "Posterior").
cond_num: The condition number.
CVRMSE: The leave-one-out cross-validation root mean squared error.

Arguments

x: A data frame containing the input (explanatory variable) training data.
y: A vector or a data frame with one column containing the output (response) training data.
reg_model: The regression model, specified as a formula, but note the left-hand side of the formula is unused; see example.
sp_model: An optional stochastic process model, specified as a formula, but note the left-hand side of the formula and the intercept are unused. The default NULL uses all column names in x.
cor_family: A character string specifying the (product, anisoptropic) correlation-function family: "PowerExponential" for the power-exponential family or "Matern" for the Matern family.
cor_par: An optional data frame containing the correlation parameters with one row per sp_model term and two columns set up as described in GaSPModel Details; only used to start the first objective optimization (see Details).
random_error: A boolean for the presence or not of a random (measurement, white-noise) error term.
sp_var, error_var: Starting values of the stochastic process and error variances for the first try to optimize the objective (see Details); valid (i.e., nonnegative) values will only be used if random_error = TRUE. The invalid default value of -1 indicates that a starting value will be chosen by Fit.
nugget: For numerical stability the proportion of the total variance due to random error is fixed at this value (random_error = FALSE) or bounded below by it (random_error = TRUE).
tries: Number of optimizations of the objective from different random starting points.
seed: The random-number seed to generate starting points.
fit_objective: The objective that Fit attempts to optimize: "Likelihood" (maximum likelihood estimation) or "Posterior" (Bayesian maximum a posteriori estimation).
theta_standardized_min, theta_standardized_max: The minimum and maximum of the standardized \(\theta\) parameter (see Details).
alpha_min, alpha_max: The minimum and maximum of the \(\alpha\) parameter of power-exponential.
derivatives_min, derivatives_max: The minimum and maximum of the \(\delta\) parameter of Matern.
log_obj_tol: An absolute tolerance for terminating the optimization of the log of the objective.
log_obj_diff: The critical value for the change in the log objective for informal tests during optimization of correlation parameters. No testing is done with the default of 0; a larger critical value such as 2 may give a more parsimonious model.
lambda_prior: The rate parameter of an exponential prior for each \(\theta\) parameter; used only if fit_objective = "Posterior".
model_comparison: The criterion used to select from multiple solutions when tries\( > 1\): the objective function ("Objective") or leave-one-out cross validation ("CV").

Details

Fit numerically optimizes the profile objective function with respect to the correlation parameters; the mean and overall variance parameters are estimated in closed form given the correlation parameters.

A cor_par data frame supplied by the user is the starting point for the first optimization try. If random_error = TRUE, then sp_var / (sp_var + error_var) is another correlation parameter to be optimized; sp_var and error_var values supplied by the user will initialize this parameter for the first try.

Set random_error = TRUE to estimate the variance of the random (measurement, white-noise) error; a small nugget error variance is for numerical stability.

For term \(j\) in the stochastic-process model, the estimate of \(\theta_j\) is constrained between theta_standardized_min / \(r_j^2\) and theta_standardized_max / \(r_j^2\), where \(r_j\) is the range of term \(j\). Note that Fit returns unscaled estimates relating to the original, unscaled inputs.

References

Sacks, J., Welch, W.J., Mitchell, T.J., and Wynn, H.P. (1989) "Design and Analysis of Computer Experiments", Statistical Science, 4, pp. 409-423, doi:10.1214/ss/1177012413.

Examples

Run this code

x <- borehole$x
y <- borehole$y
borehole_fit <- Fit(
  reg_model = ~1, x = x, y = y, cor_family = "Matern",
  random_error = FALSE, nugget = 0, fit_objective = "Posterior"
)

Run the code above in your browser using DataLab