fit_two_layer: MCMC sampling for two layer deep GP

Description

Conducts MCMC sampling of hyperparameters and hidden layer w for a two layer deep GP. Separate length scale parameters theta_w and theta_y govern the correlation strength of the hidden layer and outer layer respectively. Nugget parameter g governs noise on the outer layer. In Matern covariance, v governs smoothness.

Usage

fit_two_layer(
  x,
  y,
  dydx = NULL,
  nmcmc = 10000,
  D = ifelse(is.matrix(x), ncol(x), 1),
  monowarp = FALSE,
  pmx = FALSE,
  verb = TRUE,
  w_0 = NULL,
  theta_y_0 = 0.01,
  theta_w_0 = 0.1,
  g_0 = 0.001,
  true_g = NULL,
  v = 2.5,
  settings = NULL,
  cov = c("matern", "exp2"),
  vecchia = FALSE,
  m = NULL,
  ord = NULL,
  cores = NULL
)

Value

a list of the S3 class dgp2 or dgp2vec with elements:

x: copy of input matrix
y: copy of response vector
nmcmc: number of MCMC iterations
settings: copy of proposal/prior settings
v: copy of Matern smoothness parameter (v = 999 indicates cov = "exp2")
x_grid: grid used for monotonic warpings (monowarp = TRUE only)
dydx: copy of dydx (if not NULL)
grad_indx: stacked partial derivative indices (only if dydx is provided)
g: vector of MCMC samples for g
tau2_y: vector of MLE estimates for tau2 on the outer layer
theta_y: vector of MCMC samples for theta_y (length scale of outer layer)
tau2_w: matrix of MLE estimates for tau2 on inner layer (only returned if monowarp = TRUE, otherwise this is fixed in settings)
theta_w: matrix of MCMC samples for theta_w (length scale of inner layer)
w: list of MCMC samples for hidden layer w
w_grid: w values at x_grid locations (monowarp = TRUE only)
w_approx: Vecchia approximation object for outer layer (vecchia = TRUE only)
x_approx: Vecchia approximation object for inner layer (vecchia = TRUE only)
ll: vector of MVN log likelihood of the outer layer for reach Gibbs iteration
time: computation time in seconds

Arguments

x: vector or matrix of input locations
y: vector of response values
dydx: optional matrix of observed gradients, rows correspond to x locations, columns contain partial derivatives with respect to that input dimension (dim(dy) must match dim(x))
nmcmc: number of MCMC iterations
D: integer designating dimension of hidden layer, defaults to dimension of x
monowarp: logical or numeric. If FALSE, warpings are not forced to be monotonic. If TRUE, each input dimension is individually monotonically warped with a default grid size of 50. If numeric, triggers monotonic warpings with the provided grid size.
pmx: "prior mean x", logical indicating whether w should have prior mean of x (TRUE, requires D = ncol(x)) or prior mean zero (FALSE). pmx = TRUE is recommended for higher dimensions.
verb: logical indicating whether to print iteration progress
w_0: initial value for hidden layer w (rows must correspond to rows of x, requires ncol(w_0) = D. Defaults to the identity mapping. If nrow(w_0) < nrow(x), missing initial values are filled-in with the GP posterior mean.
theta_y_0: initial value for theta_y (length scale of outer layer)
theta_w_0: initial value for theta_w (length scale of inner layer), may be single value or vector of length D
g_0: initial value for g (only used if true_g = NULL)
true_g: if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions.
v: Matern smoothness parameter (only used if cov = "matern")
settings: hyperparameters for proposals and priors (see details)
cov: covariance kernel, either Matern ("matern") or squared exponential ("exp2")
vecchia: logical indicating whether to use Vecchia approximation
m: size of Vecchia conditioning sets, defaults to the lower of 25 or the maximum available (only used if vecchia = TRUE)
ord: optional ordering for Vecchia approximation, must correspond to rows of x, defaults to random, is applied to both x and w
cores: number of cores to use for OpenMP parallelization (vecchia = TRUE only). Defaults to min(4, maxcores - 1) where maxcores is the number of detectable available cores.

Details

Maps inputs x through hidden layer w to outputs y. Conducts sampling of the hidden layer using elliptical slice sampling. Utilizes Metropolis Hastings sampling of the length scale and nugget parameters with proposals and priors controlled by settings. When true_g is set to a specific value, the nugget is not estimated. When vecchia = TRUE, all calculations leverage the Vecchia approximation with specified conditioning set size m.

When monowarp = TRUE, each input dimension is warped separately and monotonically. This requires D = ncol(x) with x scaled to the unit cube. New in version 1.2.0 - monotonic warpings estimate separate scale parameters (tau2_w) on each latent node and use an isotropic lengthscale on the outer layer. As a default, monotonic warpings use the reference grid: seq(0, 1, length = 50). The grid size may be controlled by passing a numeric integer to monowarp (i.e., monowarp = 100 uses the grid seq(0, 1, length = 100)).

When pmx = TRUE, the prior on the latent layer is set at x (rather than the default of zero). This requires D = ncol(x).

NOTE on OpenMP: The Vecchia implementation relies on OpenMP parallelization for efficient computation. This function will produce a warning message if the package was installed without OpenMP (this is the default for CRAN packages installed on Apple machines). To set up OpenMP parallelization, download the package source code and install using the gcc/g++ compiler.

Proposals for g, theta_y, and theta_w follow a uniform sliding window scheme, e.g.,

g_star <- runif(1, l * g_t / u, u * g_t / l),

with defaults l = 1 and u = 2 provided in settings. To adjust these, set settings = list(l = new_l, u = new_u). Priors on g, theta_y, and theta_w follow Gamma distributions with shape parameters (alpha) and rate parameters (beta) controlled within the settings list object. Default priors differ for noisy/deterministic settings and depend on whether monowarp = TRUE. All default values are visible in the internal deepgp:::check_settings function. These priors are designed for x scaled to [0, 1] and y scaled to have mean 0 and variance 1. These may be adjusted using the settings input.

The scale on the latent layer (tau2_w) may also be specified in settings. Defaults to 1.

When w_0 = NULL, the hidden layer is initialized at x (i.e., the identity mapping). If w_0 is of dimension nrow(x) - 1 by D, the final row is filled-in using the GP posterior mean. This is helpful in sequential design when adding a new input location and starting the MCMC at the place where the previous MCMC left off.

The output object of class dgp2 or dgp2vec is designed for use with continue, trim, and predict.

References

Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments. *Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.* http://hdl.handle.net/10919/114845

Booth, A. S. (2025). Deep Gaussian processes with gradients. arXiv:2512.18066

Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015

Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian processes for computer experiments. *Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904

Barnett, S., Beesley, L. J., Booth, A. S., Gramacy, R. B., & Osthus D. (2025). Monotonic warpings for additive and deep Gaussian processes. *Statistics and Computing, 35*(3), 65. arXiv:2408.01540

Examples

Run this code

# Additional examples including real-world computer experiments are available at: 
# https://bitbucket.org/gramacylab/deepgp-ex/
# \donttest{
# Booth function (inspired by the Higdon function)
f <- function(x) {
  i <- which(x <= 0.58)
  x[i] <- sin(pi * x[i] * 6) + cos(pi * x[i] * 12)
  x[-i] <- 5 * x[-i] - 4.9
  return(x)
}

# Training data
x <- seq(0, 1, length = 25)
y <- f(x)

# Testing data
xx <- seq(0, 1, length = 100)
yy <- f(xx)

plot(xx, yy, type = "l")
points(x, y, col = 2)

# Example 1: nugget fixed, using continue
fit <- fit_two_layer(x, y, nmcmc = 1000, true_g = 1e-6)
plot(fit)
fit <- continue(fit, 1000) 
plot(fit, hidden = TRUE) # trace plots and ESS samples 
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)

# Example 2: using Vecchia, re-approximated after burn-in 
fit <- fit_two_layer(x, y, nmcmc = 1000, true_g = 1e-6, vecchia = TRUE, m = 10)
fit <- continue(fit, 1000, re_approx = TRUE)
plot(fit, hidden = TRUE) # trace plots and ESS samples
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)

# Example 3: using monotonic warpings
fit <- fit_two_layer(x, y, nmcmc = 2000, true_g = 1e-6, monowarp = TRUE)
plot(fit, hidden = TRUE) # trace plots and ESS samples
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
# }

Run the code above in your browser using DataLab