laGP: Localized Approximate GP Regression At a Single Predictive Location

Description

Build a sub-design of X of size end for approximate Gaussian process prediction at reference location(s) Xref, and then return the moments of those predictive equations

Usage

laGP(Xref, start, end, X, Z, d = NULL, g = 1/1000,
     method = c("alc", "mspe", "nn", "efi"), Xi.ret = TRUE,
        close = min(1000, nrow(X)), alc.gpu = FALSE, verb = 0)
laGP.R(Xref, start, end, X, Z, d = NULL, g = 1/1000,
       method = c("alc", "mspe", "nn", "efi"), Xi.ret = TRUE,
       pall = FALSE, close = min(1000, nrow(X)),  
       parallel = c("none", "omp", "gpu"), verb = 0)

Arguments

Xref

a vector of length ncol(X) containing a single reference location; or a matrix with ncol(Xref) = ncol(X) containing multiple reference locations for simultaneous sub-design and prediction

start

the number of Nearest Neighbor (NN) locations for initialization; must specify start >= 6

end

the total size of the local designs; must have start < end

a matrix or data.frame containing the full (large) design matrix of input locations

a vector of responses/dependent values with length(Z) = ncol(X)

a prior or initial setting for the (single/isotropic) lengthscale parameter for a Gaussian correlation function; a (default) NULL value causes a sensible regularization (prior) and initial setting to be generated via

a prior or initial setting for the nugget parameter; a NULL value causes a sensible regularization (prior) and initial setting to be generated via garg; a scalar (default g = 1/

method

Specifies the method by which end-start candidates from X are chosen in order to predict at Xref. In brief, ALC ("alc", default) minimizes predictive variance; MLPE ("mspe") augm

Xi.ret

A scalar logical indicating whether or not a vector of indices into X, specifying the chosen sub-design, should be returned on output

pall

a scalar logical (for laGP.R only) offering the ability to obtain predictions after every update (for progress indication and debugging), rather than after just the last update

a non-negative integer specifying the number of NNs (to Xref) in X to consider when searching for the sub-design; close = 0 specifies all

alc.gpu

a scalar logical indicating if a GPU should be used to parallelize the evaluation of the ALC criteria (method = "alc"). Requires the package be compiled with CUDA flags; see README/INSTALL in the package source for m

parallel

a switch indicating if any parallel calculation of the criteria (method) is desired. For parallel = "omp", the package be compiled with OMP flags; for parallel = "gpu", the package must be compiled

verb

a positive integer specifying the verbosity level; verb = 0 is quiet, and larger values cause more progress information to be printed to the screen

Value

The output is a list with the following components.
meana vector of predictive means of length nrow(Xref)
s2a vector of Student-t scale parameters of length nrow(Xref)
dfa Student-t degrees of freedom scalar (applies to all Xref)
llika scalar indicating the maximized log likelihood or log posterior probability of the data/parameter(s) under the chosen sub-design; provided up to an additive constant
timea scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation
methoda copy of the method argument
da full-list version of the d argument, possibly completed by darg
ga full-list version of the g argument, possibly completed by garg
mleif d$mle and/or g$mle are TRUE, then mle is a data.frame containing the values found for these parameters, and the number of required iterations
Xiwhen Xi.ret = TRUE, this field contains a vector of indices of length end into X indicating the sub-design chosen

Details

A sub-design of X of size end is built-up according to the criteria prescribed by the method used to predict at Xref. The first start locations are NNs in order to initialize the first GP, via newGP, and thereby initialize the search. Each subsequent addition is found via the chosen criterion (method), and the GP fit is updated via updateGP.

The runtime is cubic in end, although the multiplicative constant out front depends on the method chosen, the size of the design X, and close.

After building the sub-design, local MLE/MAP lengthscale (and/or nugget) parameters are estimated, depending on the d and g arguments supplied. This is facilitated by calls to mleGP or jmleGP.

Finally predGP is called on the resulting local GP model, and the parameters of the resulting Student-t distribution(s) are returned. The function laGP.R is a prototype R-only version for debugging and transparency purposes. It is slower than laGP, which is primarily in C. However it may be useful for developing new programs that involve similar subroutines. The current version allows OpenMP and/or GPU parallelization of the criteria (method) if the package is compiled with the appropriate flags. See README/INSTALL in the package source for more information

References

R.B. Gramacy and D.W. Apley (2013). Local Gaussian process approximation for large computer experiments. Preprint available on arXiv:1303.0383; http://arxiv.org/abs/1303.0383

Examples

Run this code

## examining a particular laGP call from the larger analysis provided
## in the aGP documentation

## Simple 2-d test function used in Gramacy & Apley (2013);
## thanks to Lee, Gramacy, Taddy, and others who have used it before
f2d <- function(x, y=NULL)
  {
    if(is.null(y)) {
      if(!is.matrix(x)) x <- matrix(x, ncol=2)
      y <- x[,2]; x <- x[,1]
    }
    g <- function(z)
      return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
    z <- -g(x)*g(y)
  }

## build up a design with N=~40K locations
x <- seq(-2, 2, by=0.02)
X <- as.matrix(expand.grid(x, x))
Z <- f2d(X)

## predictive grid with NN=400 locations,
## change NN to 10K to mimic setup in Gramacy & Apley (2013)
xx <- seq(-1.975, 1.975, length=20)
XX <- as.matrix(expand.grid(xx, xx))
ZZ <- f2d(XX)

## local analysis, first pass
Xref <- XX[250,]
out <- laGP(Xref, 6, 50, X, Z, method="nn")

## second and pass via ALC and MSPE, respectively
out2 <- laGP(Xref, 6, 50, X, Z, d=out$mle$d)
out2.mspe <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d)

## look at the different designs
plot(rbind(X[out2$Xi,], X[out2.mspe$Xi,]), type="n",
     xlab="x1", ylab="x2", main="comparing local designs")
points(Xref[1], Xref[2], col=2, cex=0.5)
text(X[out2$Xi,], labels=1:50, cex=0.6)
text(X[out2.mspe$Xi,], labels=1:50, cex=0.6, col="green")
legend("topright", c("pass 2 ALC", "pass 3 MSPE"),
       text.col=c("black", "green"), bty="n")

Run the code above in your browser using DataLab