This function implements the sequential design of a (D)GP emulator or a bundle of (D)GP emulators.
design(
object,
N,
x_cand,
y_cand,
n_cand,
limits,
int,
f,
reps,
freq,
x_test,
y_test,
reset,
target,
method,
eval,
verb,
autosave,
new_wave,
cores,
...
)# S3 method for gp
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_cand = 200,
limits = NULL,
int = FALSE,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
cores = 1,
...
)
# S3 method for dgp
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_cand = 200,
limits = NULL,
int = FALSE,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
cores = 1,
train_N = 100,
refit_cores = 1,
pruning = TRUE,
control = list(),
...
)
# S3 method for bundle
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_cand = 200,
limits = NULL,
int = FALSE,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
cores = 1,
train_N = 100,
refit_cores = 1,
...
)
An updated object
is returned with a slot called design
that contains:
S slots, named wave1, wave2,..., waveS
, that contain information of S waves of sequential designs that have been applied to the emulator.
Each slot contains the following elements:
N
, an integer that gives the numbers of steps implemented in the corresponding wave;
rmse
, a matrix that gives the RMSEs of emulators constructed during the corresponding wave, if eval = NULL
;
metric
, a matrix that gives the customized evaluating metric values of emulators constructed during the corresponding wave,
if a customized function is supplied to eval
;
freq
, an integer that gives the frequency that the emulator validations are implemented during the corresponding wave.
enrichment
, a vector of size N
that gives the number of new design points added after each step of the sequential design (if object
is
an instance of the gp
or dgp
class), or a matrix that gives the number of new design points added to emulators in a bundle after each step of
the sequential design (if object
is an instance of the bundle
class).
If target
is not NULL
, the following additional elements are also included:
target
, the target RMSE(s) to stop the sequential design.
reached
, a bool (if object
is an instance of the gp
or dgp
class) or a vector of bools (if object
is an instance of the bundle
class) that indicate if the target RMSEs are reached at the end of the sequential design.
a slot called type
that gives the type of validations:
either LOO ('loo') or OOS ('oos') if eval = NULL
. See validate()
for more information about LOO and OOS.
'customized' if a customized R function is provided to eval
.
two slots called x_test
and y_test
that contain the data points for the OOS validation if the type
slot is 'oos'.
If y_cand = NULL
and there are NA
s returned from the supplied f
during the sequential design, a slot called exclusion
is included
that records the located design positions that produced NA
s via f
. The sequential design will use this information to
avoid re-visiting the same locations (if x_cand
is supplied) or their neighborhoods (if x_cand
is NULL
) in later runs of design()
.
See Note section below for further information.
can be one of the following:
the S3 class gp
.
the S3 class dgp
.
the S3 class bundle
.
the number of steps for the sequential design.
a matrix (with each row being a design point and column being an input dimension) that gives a candidate set
in which the next design point is determined. If x_cand = NULL
, the candidate set will be generated using n_cand
,
limits
, and int
. Defaults to NULL
.
a matrix (with each row being a simulator evaluation and column being an output dimension) that gives the realizations
from the simulator at input positions in x_cand
. Defaults to NULL
.
an integer that gives
the size of the candidate set in which the next design point is determined, if x_cand = NULL
;
the size of a sub-set to be sampled from the candidate set x_cand
at each step of the sequential design to determine the next
design point, if x_cand
is not NULL
.
Defaults to 200
.
a two-column matrix that gives the ranges of each input dimension, or a vector of length two if there is only one
input dimension. If a vector is provided, it will be converted to a two-column row matrix. The rows of the matrix correspond to input
dimensions, and its first and second columns correspond to the minimum and maximum values of the input dimensions. Set
limits = NULL
if x_cand
is supplied. This argument is only used when x_cand
is not supplied, i.e., x_cand = NULL
. Defaults to NULL
.
a bool or a vector of bools that indicates if an input dimension is an integer type. If a bool is given, it will be applied to
all input dimensions. If a vector is provided, it should have a length equal to the input dimensions and will be applied to individual
input dimensions. Defaults to FALSE
.
an R function that represents the simulator. f
needs to be specified with the following basic rules:
the first argument of the function should be a matrix with rows being different design points and columns being input dimensions.
the output of the function can either
a matrix with rows being different outputs (corresponding to the input design points) and columns being output dimensions. If there is only one output dimension, the matrix still needs to be returned with a single column.
a list with the first element being the output matrix described above and, optionally, additional named elements which will update values
of any arguments with the same names passed via ...
. The list output can be useful if some additional arguments of f
and aggregate
need to be updated after each step of the sequential design.
See Note section below for further information. This argument is used when y_cand = NULL
. Defaults to NULL
.
an integer that gives the number of repetitions of the located design points to be created and used for evaluations of f
. Set the
argument to an integer greater than 1
if f
is a stochastic function that can generate different responses given a same input and the
supplied emulator object
can deal with stochastic responses, e.g., a (D)GP emulator with nugget_est = TRUE
or a DGP emulator with a
likelihood layer. The argument is only used when f
is supplied. Defaults to 1
.
a vector of two integers with the first element giving the frequency (in number of steps) to re-fit the
emulator, and the second element giving the frequency to implement the emulator validation (for RMSE). Defaults to c(1, 1)
.
a matrix (with each row being an input testing data point and each column being an input dimension) that gives the testing
input data to evaluate the emulator after each step of the sequential design. Set to NULL
for the LOO-based emulator validation.
Defaults to NULL
. This argument is only used if eval = NULL
.
the testing output data that correspond to x_test
for the emulator validation after each step of the sequential design:
if object
is an instance of the gp
class, y_test
is a matrix with only one column and each row being an testing output data point.
if object
is an instance of the dgp
class, y_test
is a matrix with its rows being testing output data points and columns being
output dimensions.
Set to NULL
for the LOO-based emulator validation. Defaults to NULL
. This argument is only used if eval = NULL
.
a bool or a vector of bools indicating whether to reset hyperparameters of the emulator to their initial values when it was initially
constructed after the input-output update and before the re-fit. If a bool is given, it will be applied to
every step of the sequential design. If a vector is provided, its length should be equal to N
and will be applied to individual
steps of the sequential design. Defaults to FALSE
.
a numeric or a vector that gives the target RMSEs at which the sequential design is terminated. Defaults to NULL
, in which
case the sequential design stops after N
steps. See Note section below for further information about target
.
an R function that give indices of designs points in a candidate set. The function must satisfy the following basic rules:
the first argument is an emulator object that can be either an instance of
the gp
class (produced by gp()
);
the dgp
class (produced by dgp()
);
the bundle
class (produced by pack()
).
the second argument is a matrix with rows representing a set of different design points.
the output of the function
is a vector of indices if the first argument is an instance of the gp
class;
is a matrix of indices if the first argument is an instance of the dgp
class. If there are different design points to be added with
respect to different outputs of the DGP emulator, the column number of the matrix should equal to the number of the outputs. If design
points are common to all outputs of the DGP emulator, the matrix should be single-columned. If more than one design points are determined
for a given output or for all outputs, the indices of these design points are placed in the matrix with extra rows.
is a matrix of indices if the first argument is an instance of the bundle
class. Each row of the matrix gives the indices of the design
points to be added to individual emulators in the bundle.
See alm()
, mice()
, pei()
, and vigf()
for examples on customizing method
. Defaults to vigf()
.
an R function that calculates the customized evaluating metric of the emulator. The function must satisfy the following basic rules:
the first argument is an emulator object that can be either an instance of
the gp
class (produced by gp()
);
the dgp
class (produced by dgp()
);
the bundle
class (produced by pack()
).
the output of the function can be
a single metric value, if the first argument is an instance of the gp
class;
a single metric value or a vector of metric values with the length equal to the number of output dimensions, if the first argument is an
instance of the dgp
class;
a single metric value metric or a vector of metric values with the length equal to the number of emulators in the bundle, if the first
argument is an instance of the bundle
class.
If no customized function is provided, the built-in evaluation metric, RMSE, will be calculated. Defaults to NULL
. See Note section below for further information.
a bool indicating if the trace information will be printed during the sequential design.
Defaults to TRUE
.
a list that contains configuration settings for the automatic saving of the emulator:
switch
: a bool indicating whether to enable the automatic saving of the emulator during the sequential design. When set to TRUE
,
the emulator in the final iteration is always saved. Defaults to FALSE
.
directory
: a string specifying the directory path where the emulators will be stored. Emulators will be stored in a sub-directory
of directory
named 'emulator-id
'. Defaults to './check_points'.
fname
: a string representing the base name for the saved emulator files. Defaults to 'check_point'.
freq
: an integer indicating the frequency of automatic savings, measured in the number of iterations. Defaults to 5
.
overwrite
: a bool value controlling the file saving behavior. When set to TRUE
, each new automatic saving overwrites the previous one,
keeping only the latest version. If FALSE
, each automatic saving creates a new file, preserving all previous versions. Defaults to FALSE
.
a bool indicating if the current execution of design()
will create a new wave of sequential designs or add the sequential designs to
the last existing wave. This argument is only used if there are waves existing in the emulator. By creating new waves, one can better visualize the performance
of the sequential designs in different executions of design()
in draw()
and can specify a different evaluation frequency in freq
. However, it can be
beneficiary to turn this option off to restrict a large number of waves to be visualized in draw()
that could run out of colors. Defaults to TRUE
.
an integer that gives the number of cores to be used for emulator validations. If set to NULL
, the number of cores is
set to (max physical cores available - 1)
. Defaults to 1
. This argument is only used if eval = NULL
.
any arguments (with names different from those of arguments used in design()
) that are used by f
, method
, and eval
can be passed here. design()
will pass relevant arguments to f
, method
, and eval
based on the names of additional arguments provided.
an integer or a vector of integers that gives the number of training iterations to be used to re-fit the DGP emulator at each step of the sequential design:
If train_N
is an integer, then at each step the DGP emulator will re-fitted (based on the frequency of re-fit specified in freq
) with train_N
iterations.
If train_N
is a vector, then its size must be N
even the re-fit frequency specified in freq
is not one.
Defaults to 100
.
the number of cores/workers to be used to re-fit GP components (in the same layer of a DGP emulator)
at each M-step during the re-fitting. If set to NULL
, the number of cores is set to (max physical cores available - 1)
.
Only use multiple cores when there is a large number of GP components in different layers and optimization of GP components
is computationally expensive. Defaults to 1
.
a bool indicating if dynamic pruning of DGP structures will be implemented during the sequential design after the total number of
design points exceeds min_size
in control
. The argument is only applicable to DGP emulators (i.e., object
is an instance of dgp
class)
produced by dgp()
with struc = NULL
. Defaults to TRUE
.
a list that can supply any of the following components to control the dynamic pruning of the DGP emulator:
min_size
, the minimum number of design points required to trigger the dynamic pruning. Defaults to 10 times of the input dimensions.
threshold
, the R2 value above which a GP node is considered redundant. Defaults to 0.97
.
nexceed
, the minimum number of consecutive iterations that the R2 value of a GP node must exceed threshold
to trigger the removal of that node from
the DGP structure. Defaults to 3
.
The argument is only used when pruning = TRUE
.
See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.
if (FALSE) {
# load packages and the Python env
library(lhs)
library(dgpsi)
# construct a 2D non-stationary function that takes a matrix as the input
f <- function(x) {
sin(1/((0.7*x[,1,drop=F]+0.3)*(0.7*x[,2,drop=F]+0.3)))
}
# generate the initial design
X <- maximinLHS(5,2)
Y <- f(X)
# generate the validation data
validate_x <- maximinLHS(30,2)
validate_y <- f(validate_x)
# training a 2-layered DGP emulator with the initial design
m <- dgp(X, Y)
# specify the ranges of the input dimensions
lim_1 <- c(0, 1)
lim_2 <- c(0, 1)
lim <- rbind(lim_1, lim_2)
# 1st wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# 2nd wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# 3rd wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# draw the design created by the sequential design
draw(m,'design')
# inspect the trace of RMSEs during the sequential design
draw(m,'rmse')
# reduce the number of imputations for faster OOS
m_faster <- set_imp(m, 5)
# plot the OOS validation with the faster DGP emulator
plot(m_faster, x_test = validate_x, y_test = validate_y)
}
Run the code above in your browser using DataLab