Learn R Programming

GaSP (version 1.0.6)

Fit: Fit a GaSP model.

Description

Fit (train) a GaSP model.

Usage

Fit(
  x,
  y,
  reg_model,
  sp_model = NULL,
  cor_family = c("PowerExponential", "Matern"),
  cor_par = data.frame(0),
  random_error = c(FALSE, TRUE),
  sp_var = -1,
  error_var = -1,
  nugget = 1e-09,
  tries = 10,
  seed = 500,
  fit_objective = c("Likelihood", "Posterior"),
  theta_standardized_min = 0,
  theta_standardized_max = .Machine$double.xmax,
  alpha_min = 0,
  alpha_max = 1,
  derivatives_min = 0,
  derivatives_max = 3,
  log_obj_tol = 1e-05,
  log_obj_diff = 0,
  lambda_prior = 0.1,
  model_comparison = c("Objective", "CV")
)

Value

A GaSPModel object, which is a list with the following components:

x

The data frame containing the input training data.

y

The training output data, now as a vector.

reg_model

The regression model, now in the form of a data frame.

sp_model

The stochastic process model, now in the form of a data frame.

cor_family

The correlation family.

cor_par

A data frame for the estimated correlation parameters.

random_error

The boolean for the presence or not of a random error term.

sp_var

The estimated stochastic process variance.

error_var

The estimated random error variance.

beta

A data frame holding the estimated regression-model parameters.

objective

The maximum value found for the objective function: the log likelihood (fit_objective = "Likelihood") or the log posterior (fit_objective = "Posterior").

cond_num

The condition number.

CVRMSE

The leave-one-out cross-validation root mean squared error.

Arguments

x

A data frame containing the input (explanatory variable) training data.

y

A vector or a data frame with one column containing the output (response) training data.

reg_model

The regression model, specified as a formula, but note the left-hand side of the formula is unused; see example.

sp_model

An optional stochastic process model, specified as a formula, but note the left-hand side of the formula and the intercept are unused. The default NULL uses all column names in x.

cor_family

A character string specifying the (product, anisoptropic) correlation-function family: "PowerExponential" for the power-exponential family or "Matern" for the Matern family.

cor_par

An optional data frame containing the correlation parameters with one row per sp_model term and two columns set up as described in GaSPModel Details; only used to start the first objective optimization (see Details).

random_error

A boolean for the presence or not of a random (measurement, white-noise) error term.

sp_var, error_var

Starting values of the stochastic process and error variances for the first try to optimize the objective (see Details); valid (i.e., nonnegative) values will only be used if random_error = TRUE. The invalid default value of -1 indicates that a starting value will be chosen by Fit.

nugget

For numerical stability the proportion of the total variance due to random error is fixed at this value (random_error = FALSE) or bounded below by it (random_error = TRUE).

tries

Number of optimizations of the objective from different random starting points.

seed

The random-number seed to generate starting points.

fit_objective

The objective that Fit attempts to optimize: "Likelihood" (maximum likelihood estimation) or "Posterior" (Bayesian maximum a posteriori estimation).

theta_standardized_min, theta_standardized_max

The minimum and maximum of the standardized \(\theta\) parameter (see Details).

alpha_min, alpha_max

The minimum and maximum of the \(\alpha\) parameter of power-exponential.

derivatives_min, derivatives_max

The minimum and maximum of the \(\delta\) parameter of Matern.

log_obj_tol

An absolute tolerance for terminating the optimization of the log of the objective.

log_obj_diff

The critical value for the change in the log objective for informal tests during optimization of correlation parameters. No testing is done with the default of 0; a larger critical value such as 2 may give a more parsimonious model.

lambda_prior

The rate parameter of an exponential prior for each \(\theta\) parameter; used only if fit_objective = "Posterior".

model_comparison

The criterion used to select from multiple solutions when tries\( > 1\): the objective function ("Objective") or leave-one-out cross validation ("CV").

Details

Fit numerically optimizes the profile objective function with respect to the correlation parameters; the mean and overall variance parameters are estimated in closed form given the correlation parameters.

A cor_par data frame supplied by the user is the starting point for the first optimization try. If random_error = TRUE, then sp_var / (sp_var + error_var) is another correlation parameter to be optimized; sp_var and error_var values supplied by the user will initialize this parameter for the first try.

Set random_error = TRUE to estimate the variance of the random (measurement, white-noise) error; a small nugget error variance is for numerical stability.

For term \(j\) in the stochastic-process model, the estimate of \(\theta_j\) is constrained between theta_standardized_min / \(r_j^2\) and theta_standardized_max / \(r_j^2\), where \(r_j\) is the range of term \(j\). Note that Fit returns unscaled estimates relating to the original, unscaled inputs.

References

Sacks, J., Welch, W.J., Mitchell, T.J., and Wynn, H.P. (1989) "Design and Analysis of Computer Experiments", Statistical Science, 4, pp. 409-423, doi:10.1214/ss/1177012413.

Examples

Run this code
x <- borehole$x
y <- borehole$y
borehole_fit <- Fit(
  reg_model = ~1, x = x, y = y, cor_family = "Matern",
  random_error = FALSE, nugget = 0, fit_objective = "Posterior"
)

Run the code above in your browser using DataLab