This function builds and trains a GP emulator.
gp(
X,
Y,
struc = NULL,
name = "sexp",
lengthscale = rep(0.1, ncol(X)),
bounds = NULL,
prior = "ref",
nugget_est = FALSE,
nugget = ifelse(nugget_est, 0.01, 1e-08),
scale_est = TRUE,
scale = 1,
training = TRUE,
verb = TRUE,
internal_input_idx = NULL,
linked_idx = NULL,
id = NULL
)
An S3 class named gp
that contains five slots:
id
: A number or character string assigned through the id
argument.
data
: a list that contains two elements: X
and Y
which are the training input and output data respectively.
specs
: a list that contains seven elements:
kernel
: the type of the kernel function used. Either "sexp"
for squared exponential kernel or "matern2.5"
for Matérn-2.5 kernel.
lengthscales
: a vector of lengthscales in the kernel function.
scale
: the variance value in the kernel function.
nugget
: the nugget value in the kernel function.
internal_dims
: the column indices of X
that correspond to the linked emulators in the preceding layers of a linked system.
external_dims
: the column indices of X
that correspond to global inputs to the linked system of emulators. It is shown as FALSE
if internal_input_idx = NULL
.
linked_idx
: the value passed to argument linked_idx
. It is shown as FALSE
if the argument linked_idx
is NULL
.
internal_dims
and external_dims
are generated only when struc = NULL
.
constructor_obj
: a 'python' object that stores the information of the constructed GP emulator.
container_obj
: a 'python' object that stores the information for the linked emulation.
emulator_obj
: a 'python' object that stores the information for the predictions from the GP emulator.
The returned gp
object can be used by
predict()
for GP predictions.
validate()
for LOO and OOS validations.
plot()
for validation plots.
lgp()
for linked (D)GP emulator constructions.
summary()
to summarize the trained GP emulator.
write()
to save the GP emulator to a .pkl
file.
set_linked_idx()
to add the linking information to the GP emulator for linked emulations.
design()
for sequential designs.
update()
to update the GP emulator with new inputs and outputs.
alm()
, mice()
, pei()
, and vigf()
to locate next design points.
a matrix where each row is an input data point and each column is an input dimension.
a matrix with only one column and each row being an output data point.
an object produced by kernel()
that gives a user-defined GP specifications. When struc = NULL
,
the GP specifications are automatically generated using information provided in name
, lengthscale
,
nugget_est
, nugget
, scale_est
, scale
,and internal_input_idx
. Defaults to NULL
.
kernel function to be used. Either "sexp"
for squared exponential kernel or
"matern2.5"
for Matérn-2.5 kernel. Defaults to "sexp"
. This argument is only used when struc = NULL
.
initial values of lengthscales in the kernel function. It can be a single numeric value or a vector:
if it is a single numeric value, it is assumed that kernel functions across input dimensions share the same lengthscale;
if it is a vector (which must have a length of ncol(X)
), it is assumed that kernel functions across input dimensions have different lengthscales.
Defaults to a vector of 0.1
. This argument is only used when struc = NULL
.
the lower and upper bounds of lengthscales in the kernel function. It is a vector of length two where the first element is
the lower bound and the second element is the upper bound. The bounds will be applied to all lengthscales in the kernel function. Defaults
to NULL
where no bounds are specified for the lengthscales. This argument is only used when struc = NULL
.
prior to be used for Maximum a Posterior for lengthscales and nugget of the GP: gamma prior ("ga"
), inverse gamma prior ("inv_ga"
),
or jointly robust prior ("ref"
). Defaults to "ref"
. This argument is only used when struc = NULL
. See the reference below for the jointly
robust prior.
a bool indicating if the nugget term is to be estimated:
FALSE
: the nugget term is fixed to nugget
.
TRUE
: the nugget term will be estimated.
Defaults to FALSE
. This argument is only used when struc = NULL
.
the initial nugget value. If nugget_est = FALSE
, the assigned value is fixed during the training.
Set nugget
to a small value (e.g., 1e-8
) and the corresponding bool in nugget_est
to FASLE
for deterministic emulations where the emulator
interpolates the training data points. Set nugget
to a reasonable larger value and the corresponding bool in nugget_est
to TRUE
for stochastic
emulations where the computer model outputs are assumed to follow a homogeneous Gaussian distribution. Defaults to 1e-8
if nugget_est = FALSE
and
0.01
if nugget_est = TRUE
. This argument is only used when struc = NULL
.
a bool indicating if the variance is to be estimated:
FALSE
: the variance is fixed to scale
.
TRUE
: the variance term will be estimated.
Defaults to TRUE
. This argument is only used when struc = NULL
.
the initial variance value. If scale_est = FALSE
, the assigned value is fixed during the training.
Defaults to 1
. This argument is only used when struc = NULL
.
a bool indicating if the initialized GP emulator will be trained.
When set to FALSE
, gp()
returns an untrained GP emulator, to which one can apply summary()
to inspect its specifications
(especially when a customized struc
is provided) or apply predict()
to check its emulation performance before the training. Defaults to TRUE
.
a bool indicating if the trace information on GP emulator construction and training will be printed during the function execution.
Defaults to TRUE
.
the column indices of X
that are generated by the linked emulators in the preceding layers.
Set internal_input_idx = NULL
if the GP emulator is in the first layer of a system or all columns in X
are
generated by the linked emulators in the preceding layers. Defaults to NULL
. This argument is only used when struc = NULL
.
either a vector or a list of vectors:
If linked_idx
is a vector, it gives indices of columns in the pooled output matrix (formed by column-combined outputs of all
emulators in the feeding layer) that feed into the GP emulator. The length of the vector shall equal to the length of internal_input_idx
when internal_input_idx
is not NULL
. If the GP emulator is in the first layer of a linked emulator system, the vector gives the column indices of the global
input (formed by column-combining all input matrices of emulators in the first layer) that the GP emulator will use. If the GP emulator is to be used in both the first
and subsequent layers, one should initially set linked_idx
to the appropriate values for the situation where the emulator is not in the first layer. Then, use the
function set_linked_idx()
to reset the linking information when the emulator is in the first layer.
When the GP emulator is not in the first layer of a linked emulator system, linked_idx
can be a list that gives the information on connections
between the GP emulator and emulators in all preceding layers. The length of the list should equal to the number of layers before
the GP emulator. Each element of the list is a vector that gives indices of columns in the pooled output matrix (formed by column-combined outputs
of all emulators) in the corresponding layer that feed into the GP emulator. If the GP emulator has no connections to any emulator in a certain layer,
set NULL
in the corresponding position of the list. The order of input dimensions in X[,internal_input_idx]
should be consistent with linked_idx
.
For example, a GP emulator in the second layer that is fed by the output dimension 1 and 3 of emulators in layer 1 should have linked_idx = list( c(1,3) )
.
In addition, the first and second columns of X[,internal_input_idx]
should correspond to the output dimensions 1 and 3 from layer 1.
Set linked_idx = NULL
if the GP emulator will not be used for linked emulations. However, if this is no longer the case, one can use set_linked_idx()
to add linking information to the GP emulator. Defaults to NULL
.
an ID to be assigned to the GP emulator. If an ID is not provided (i.e., id = NULL
), a UUID (Universally Unique Identifier) will be automatically generated
and assigned to the emulator. Default to NULL
.
See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.
Gu, M. (2019). Jointly robust prior for Gaussian stochastic process in emulation, calibration and variable selection. Bayesian Analysis, 14(3), 857-885.
if (FALSE) {
# load the package and the Python env
library(dgpsi)
# construct a step function
f <- function(x) {
if (x < 0.5) return(-1)
if (x >= 0.5) return(1)
}
# generate training data
X <- seq(0, 1, length = 10)
Y <- sapply(X, f)
# training
m <- gp(X, Y)
# summarizing
summary(m)
# LOO cross validation
m <- validate(m)
plot(m)
# prediction
test_x <- seq(0, 1, length = 200)
m <- predict(m, x = test_x)
# OOS validation
validate_x <- sample(test_x, 10)
validate_y <- sapply(validate_x, f)
plot(m, validate_x, validate_y)
# write and read the constructed emulator
write(m, 'step_gp')
m <- read('step_gp')
}
Run the code above in your browser using DataLab