fit_three_layer: MCMC sampling for three layer deep GP

Description

Conducts MCMC sampling of hyperparameters, hidden layer z, and hidden layer w for a three layer deep GP. Separate length scale parameters theta_z, theta_w, and theta_y govern the correlation strength of the inner layer, middle layer, and outer layer respectively. Nugget parameter g governs noise on the outer layer. In Matern covariance, v governs smoothness.

Currently, there are no pmx, monowarp, or dydx options.

Usage

fit_three_layer(
  x,
  y,
  nmcmc = 10000,
  D = ifelse(is.matrix(x), ncol(x), 1),
  verb = TRUE,
  w_0 = NULL,
  z_0 = NULL,
  theta_y_0 = 0.01,
  theta_w_0 = 0.1,
  theta_z_0 = 0.1,
  g_0 = 0.001,
  true_g = NULL,
  v = 2.5,
  settings = NULL,
  cov = c("matern", "exp2"),
  vecchia = FALSE,
  m = NULL,
  ord = NULL,
  cores = NULL
)

Value

a list of the S3 class dgp3 or dgp3vec with elements:

x: copy of input matrix
y: copy of response vector
nmcmc: number of MCMC iterations
settings: copy of proposal/prior settings
v: copy of Matern smoothness parameter (v = 999 indicates cov = "exp2")
g: vector of MCMC samples for g
tau2_y: vector of MLE estimates for tau2 on outer layer
theta_y: vector of MCMC samples for theta_y (length scale of outer layer)
theta_w: matrix of MCMC samples for theta_w (length scale of middle layer)
theta_z: matrix of MCMC samples for theta_z (length scale of inner layer)
w: list of MCMC samples for middle hidden layer w
z: list of MCMC samples for inner hidden layer z
w_approx: Vecchia approximation object for outer layer (vecchia = TRUE only)
z_approx: Vecchia approximation object for middle layer (vecchia = TRUE only)
x_approx: Vecchia approximation object for inner layer (vecchia = TRUE only)
ll: vector of MVN log likelihood of the outer layer for reach Gibbs iteration
time: computation time in seconds

Arguments

x: vector or matrix of input locations
y: vector of response values
nmcmc: number of MCMC iterations
D: integer designating dimension of hidden layers, defaults to dimension of x
verb: logical indicating whether to print iteration progress
w_0: initial value for hidden layer w (rows must correspond to rows of x, requires ncol(w_0) = D. Defaults to the identity mapping. If nrow(w_0) < nrow(x), missing initial values are filled-in with the GP posterior mean.
z_0: initial value for hidden layer z (rows must correspond to rows of x, requires ncol(z_0) = D. Defaults to the identity mapping. If nrow(z_0) < nrow(x), missing initial values are filled-in with the GP posterior mean.
theta_y_0: initial value for theta_y (length scale of outer layer)
theta_w_0: initial value for theta_w (length scale of middle layer), may be single value or vector of length D
theta_z_0: initial value for theta_z (length scale of inner layer), may be single value or vector of length D
g_0: initial value for g
true_g: if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions.
v: Matern smoothness parameter (only used if cov = "matern")
settings: hyperparameters for proposals and priors (see details)
cov: covariance kernel, either Matern ("matern") or squared exponential ("exp2")
vecchia: logical indicating whether to use Vecchia approximation
m: size of Vecchia conditioning sets, defaults to the lower of 25 or the maximum available (only used if vecchia = TRUE)
ord: optional ordering for Vecchia approximation, must correspond to rows of x, defaults to random, is applied to x, w, and z
cores: number of cores to use for OpenMP parallelization (vecchia = TRUE only). Defaults to min(4, maxcores - 1) where maxcores is the number of detectable available cores.

Details

Maps inputs x through hidden layer z then hidden layer w to outputs y. Conducts sampling of the hidden layers using elliptical slice sampling. Utilizes Metropolis Hastings sampling of the length scale and nugget parameters with proposals and priors controlled by settings. When true_g is set to a specific value, the nugget is not estimated. When vecchia = TRUE, all calculations leverage the Vecchia approximation with specified conditioning set size m.

NOTE on OpenMP: The Vecchia implementation relies on OpenMP parallelization for efficient computation. This function will produce a warning message if the package was installed without OpenMP (this is the default for CRAN packages installed on Apple machines). To set up OpenMP parallelization, download the package source code and install using the gcc/g++ compiler.

Proposals for g, theta_y, theta_w, and theta_z follow a uniform sliding window scheme, e.g.,

g_star <- runif(1, l * g_t / u, u * g_t / l),

with defaults l = 1 and u = 2 provided in settings. To adjust these, set settings = list(l = new_l, u = new_u). Priors on g, theta_y, theta_w, and theta_z follow Gamma distributions with shape parameters (alpha) and rate parameters (beta) controlled within the settings list object. Default priors differ for noisy/deterministic settings. All default values are visible in the internal deepgp:::check_settings function. These priors are designed for x scaled to [0, 1] and y scaled to have mean 0 and variance 1. These may be adjusted using the settings input.

The scale on the latent layers (tau2_z and tau2_w) may also be specified in settings. Defaults to 1.

When w_0 = NULL and/or z_0 = NULL, the hidden layers are initialized at x (i.e., the identity mapping). If w_0 and/or z_0 is of dimension nrow(x) - 1 by D, the final row is filled-in using the GP posterior mean. This is helpful in sequential design when adding a new input location and starting the MCMC at the place where the previous MCMC left off.

The output object of class dgp3 or dgp3vec is designed for use with continue, trim, and predict.

References

Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments. *Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.* http://hdl.handle.net/10919/114845

Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015

Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian processes for computer experiments. *Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904

Examples

Run this code

# Additional examples including real-world computer experiments are available at: 
# https://bitbucket.org/gramacylab/deepgp-ex/
# \donttest{
# G function in 2 dimensions (https://www.sfu.ca/~ssurjano/gfunc.html)
f <- function(xx, a = (c(1:length(xx)) - 1) / 2) { 
  new1 <- abs(4 * xx - 2) + a
  new2 <- 1 + a
  prod <- prod(new1 / new2)
  return((prod - 1) / 0.86)
}

# Training data
d <- 2
n <- 30
x <- matrix(runif(n * d), ncol = d)
y <- apply(x, 1, f)

# Testing data
n_test <- 500
xx <- matrix(runif(n_test * d), ncol = d)
yy <- apply(xx, 1, f)

i <- interp::interp(xx[, 1], xx[, 2], yy)
image(i, col = heat.colors(128))
contour(i, add = TRUE)
contour(i, level = -0.5, col = 4, add = TRUE) # potential failure limit
points(x)

# Example 1: nugget fixed, calculating entropy
fit <- fit_three_layer(x, y, nmcmc = 2000, true_g = 1e-6)
plot(fit)
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, entropy_limit = -0.5, cores = 1)
plot(fit)
i <- interp::interp(xx[, 1], xx[, 2], fit$entropy)
image(i, col = heat.colors(128), main = "Entropy")

# Example 2: using Vecchia
fit <- fit_three_layer(x, y, nmcmc = 2000, true_g = 1e-6, vecchia = TRUE, m = 10)
plot(fit)
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
# }

Run the code above in your browser using DataLab