validate: Validate a constructed GP, DGP, or linked (D)GP emulator

Description

This function validate a constructed GP, DGP, or linked (D)GP emulator via the Leave-One-Out (LOO) cross validation or Out-Of-Sample (OOS) validation.

Usage

validate(object, x_test, y_test, method, verb, force, cores, ...)
# S3 method for gp
validate(
  object,
  x_test = NULL,
  y_test = NULL,
  method = "mean_var",
  verb = TRUE,
  force = FALSE,
  cores = 1,
  ...
)
# S3 method for dgp
validate(
  object,
  x_test = NULL,
  y_test = NULL,
  method = "mean_var",
  verb = TRUE,
  force = FALSE,
  cores = 1,
  threading = FALSE,
  ...
)
# S3 method for lgp
validate(
  object,
  x_test = NULL,
  y_test = NULL,
  method = "mean_var",
  verb = TRUE,
  force = FALSE,
  cores = 1,
  threading = FALSE,
  ...
)

Value

If object is an instance of the gp class, an updated object is returned with an additional slot called loo (for LOO cross validation) or oos (for OOS validation) that contains:
- two slots called x_train (or x_test) and y_train (or y_test) that contain the validation data points for LOO (or OOS).
- a column matrix called mean, if method = "mean_var", or median, if method = "sampling", that contains the predictive means or medians of the GP emulator at validation positions.
- three column matrices called std, lower, and upper that contain the predictive standard deviations and credible intervals of the GP emulator at validation positions. If method = "mean_var", the upper and lower bounds of a credible interval are two standard deviations above and below the predictive mean. If method = "sampling", the upper and lower bounds of a credible interval are 2.5th and 97.5th percentiles.
- a numeric value called rmse that contains the root mean/median squared error of the GP emulator.
- a numeric value called nrmse that contains the (min-max) normalized root mean/median squared error of the GP emulator. The min-max normalization is based on the maximum and minimum values of the validation outputs contained in y_train (or y_test).

The rows of matrices (mean, median, std, lower, and upper) correspond to the validation positions.

If object is an instance of the dgp class, an updated object is returned with an additional slot called loo (for LOO cross validation) or oos (for OOS validation) that contains:

two slots called x_train (or x_test) and y_train (or y_test) that contain the validation data points for LOO (or OOS).
a matrix called mean, if method = "mean_var", or median, if method = "sampling", that contains the predictive means or medians of the DGP emulator at validation positions.
three matrices called std, lower, and upper that contain the predictive standard deviations and credible intervals of the DGP emulator at validation positions. If method = "mean_var", the upper and lower bounds of a credible interval are two standard deviations above and below the predictive mean. If method = "sampling", the upper and lower bounds of a credible interval are 2.5th and 97.5th percentiles.
a vector called rmse that contains the root mean/median squared errors of the DGP emulator across different output dimensions.
a vector called nrmse that contains the (min-max) normalized root mean/median squared errors of the DGP emulator across different output dimensions. The min-max normalization is based on the maximum and minimum values of the validation outputs contained in y_train (or y_test).

The rows and columns of matrices (mean, median, std, lower, and upper) correspond to the validation positions and DGP emulator output dimensions, respectively.

If object is an instance of the lgp class, an updated object is returned with an additional slot called oos (for OOS validation) that contains:

two slots called x_test and y_test that contain the validation data points for OOS.
a list called mean, if method = "mean_var", or median, if method = "sampling", that contains the predictive means or medians of the linked (D)GP emulator at validation positions.
three lists called std, lower, and upper that contain the predictive standard deviations and credible intervals of the linked (D)GP emulator at validation positions. If method = "mean_var", the upper and lower bounds of a credible interval are two standard deviations above and below the predictive mean. If method = "sampling", the upper and lower bounds of a credible interval are 2.5th and 97.5th percentiles.
a list called rmse that contains the root mean/median squared errors of the linked (D)GP emulator.
a list called nrmse that contains the (min-max) normalized root mean/median squared errors of the linked (D)GP emulator. The min-max normalization is based on the maximum and minimum values of the validation outputs contained in y_test.

Each element in mean, median, std, lower, upper, rmse, and nrmse corresponds to a (D)GP emulator in the final layer of the linked (D)GP emulator.

Arguments

object

can be one of the following:

the S3 class gp.
the S3 class dgp.
the S3 class lgp.

x_test

the OOS testing input data:

if x is an instance of the gp or dgp class, x_test is a matrix where each row is an input testing data point and each column is an input dimension.
if x is an instance of the lgp class, x_test can be a matrix or a list:
- if x_test is a matrix, it is the global testing input data that feed into the emulators in the first layer of a system. The rows of x_test represent different input data points and the columns represent input dimensions across all emulators in the first layer of the system. In this case, it is assumed that the only global input to the system is the input to the emulators in the first layer and there is no global input to emulators in other layers.
- if x_test is a list, it should have L (the number of layers in an emulator system) elements. The first element is a matrix that represents the global testing input data that feed into the emulators in the first layer of the system. The remaining L-1 elements are L-1 sub-lists, each of which contains a number (the same number of emulators in the corresponding layer) of matrices (rows being testing input data points and columns being input dimensions) that represent the global testing input data to the emulators in the corresponding layer. The matrices must be placed in the sub-lists based on how their corresponding emulators are placed in struc argument of lgp(). If there is no global input data to a certain emulator, set NULL in the corresponding sub-list of x_test.

x_test must be provided for the validation if x is an instance of the lgp. Defaults to NULL.

y_test

the OOS testing output data that correspond to x_test:

if x is an instance of the gp class, y_test is a matrix with only one column and each row being an testing output data point.
if x is an instance of the dgp class, y_test is a matrix with its rows being testing output data points and columns being output dimensions.
if x is an instance of the lgp class, y_test can be a single matrix or a list of matrices:
- if y_test is a single matrix, then there is only one emulator in the final layer of the linked emulator system and y_test represents the emulator's output with rows being testing positions and columns being output dimensions.
- if y_test is a list, then y_test should have M number (the same number of emulators in the final layer of the system) of matrices. Each matrix has its rows corresponding to testing positions and columns corresponding to output dimensions of the associated emulator in the final layer.

y_test must be provided for the validation if x is an instance of the lgp. Defaults to NULL.

method

the prediction approach in validations: mean-variance ("mean_var") or sampling ("sampling") approach. Defaults to "mean_var".

verb

a bool indicating if the trace information on validations will be printed during the function execution. Defaults to TRUE.

force

a bool indicating whether to force the LOO or OOS re-evaluation when loo or oos slot already exists in object. When force = FALSE, validate() will try to determine automatically if the LOO or OOS re-evaluation is needed. Set force to TRUE when LOO or OOS re-evaluation is required. Defaults to FALSE.

cores

the number of cores/workers to be used for the LOO or OOS validation. If set to NULL, the number of cores is set to (max physical cores available - 1). Defaults to 1.

...

N/A.

threading

a bool indicating whether to use the multi-threading to accelerate the LOO or OOS. Turning this option on could improve the speed of validations when the emulator is built with a moderately large number of training data points and the Matérn-2.5 kernel.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Examples

Run this code

if (FALSE) {

# See gp(), dgp(), or lgp() for an example.
}

Run the code above in your browser using DataLab