Fit (train) a GaSP model.
Fit(
x,
y,
reg_model,
sp_model = NULL,
cor_family = c("PowerExponential", "Matern"),
cor_par = data.frame(0),
random_error = c(FALSE, TRUE),
sp_var = -1,
error_var = -1,
nugget = 1e-09,
tries = 10,
seed = 500,
fit_objective = c("Likelihood", "Posterior"),
theta_standardized_min = 0,
theta_standardized_max = .Machine$double.xmax,
alpha_min = 0,
alpha_max = 1,
derivatives_min = 0,
derivatives_max = 3,
log_obj_tol = 1e-05,
log_obj_diff = 0,
lambda_prior = 0.1,
model_comparison = c("Objective", "CV")
)
A GaSPModel
object, which is a list with the following components:
The data frame containing the input training data.
The training output data, now as a vector.
The regression model, now in the form of a data frame.
The stochastic process model, now in the form of a data frame.
The correlation family.
A data frame for the estimated correlation parameters.
The boolean for the presence or not of a random error term.
The estimated stochastic process variance.
The estimated random error variance.
A data frame holding the estimated regression-model parameters.
The maximum value found for the objective function: the log likelihood (fit_objective = "Likelihood") or the log posterior (fit_objective = "Posterior").
The condition number.
The leave-one-out cross-validation root mean squared error.
A data frame containing the input (explanatory variable) training data.
A vector or a data frame with one column containing the output (response) training data.
The regression model, specified as a formula, but note the left-hand side of the formula is unused; see example.
An optional stochastic process model, specified as a formula,
but note the left-hand side of the formula and the intercept are unused.
The default NULL
uses all column names in x
.
A character string specifying the (product, anisoptropic) correlation-function family: "PowerExponential" for the power-exponential family or "Matern" for the Matern family.
An optional data frame containing the correlation parameters
with one row per sp_model
term and two columns set up as
described in GaSPModel
Details;
only used to start the first objective optimization (see Details).
A boolean for the presence or not of a random (measurement, white-noise) error term.
Starting values of the stochastic process and error variances
for the first try to optimize the objective (see Details);
valid (i.e., nonnegative) values will only be used if random_error = TRUE
.
The invalid default value of -1 indicates that a starting value
will be chosen by Fit
.
For numerical stability the proportion of the total variance
due to random error is fixed at this value (random_error = FALSE
) or
bounded below by it (random_error = TRUE
).
Number of optimizations of the objective from different random starting points.
The random-number seed to generate starting points.
The objective that Fit
attempts to optimize:
"Likelihood" (maximum likelihood estimation)
or "Posterior" (Bayesian maximum a posteriori estimation).
The minimum and maximum of the standardized \(\theta\) parameter (see Details).
The minimum and maximum of the \(\alpha\) parameter of power-exponential.
The minimum and maximum of the \(\delta\) parameter of Matern.
An absolute tolerance for terminating the optimization of the log of the objective.
The critical value for the change in the log objective for informal tests during optimization of correlation parameters. No testing is done with the default of 0; a larger critical value such as 2 may give a more parsimonious model.
The rate parameter of an exponential prior
for each \(\theta\) parameter;
used only if fit_objective = "Posterior"
.
The criterion used to select from multiple solutions
when tries
\( > 1\): the objective function ("Objective")
or leave-one-out cross validation ("CV").
Fit numerically optimizes the profile objective function with respect to the correlation parameters; the mean and overall variance parameters are estimated in closed form given the correlation parameters.
A cor_par
data frame supplied by the user is the starting point
for the first optimization try.
If random_error = TRUE
,
then sp_var
/ (sp_var
+ error_var
) is another
correlation parameter to be optimized;
sp_var
and error_var
values supplied by the user
will initialize this parameter for the first try.
Set random_error = TRUE
to estimate the variance of the
random (measurement, white-noise) error;
a small nugget
error variance is for numerical stability.
For term \(j\) in the stochastic-process model,
the estimate of \(\theta_j\) is constrained between
theta_standardized_min
/ \(r_j^2\) and
theta_standardized_max
/ \(r_j^2\),
where \(r_j\) is the range of term \(j\).
Note that Fit
returns unscaled estimates relating to the original, unscaled inputs.
Sacks, J., Welch, W.J., Mitchell, T.J., and Wynn, H.P. (1989) "Design and Analysis of Computer Experiments", Statistical Science, 4, pp. 409-423, doi:10.1214/ss/1177012413.
x <- borehole$x
y <- borehole$y
borehole_fit <- Fit(
reg_model = ~1, x = x, y = y, cor_family = "Matern",
random_error = FALSE, nugget = 0, fit_objective = "Posterior"
)
Run the code above in your browser using DataLab