aGP: Localized Approximate GP Regression For Many Predictive Locations

Description

Facilitates localized Gaussian process inference and prediction at a large set of predictive locations, by (essentially) calling laGP at each location, and returning the moments of the predictive equations, and indices into the design, thus obtained

Usage

aGP(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000,
    method = c("alc", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE,
    close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)), 
    numrays = ncol(X), num.gpus = 0, gpu.threads = num.gpus, 
    omp.threads = if (num.gpus > 0) 0 else 1, 
    nn.gpu = if (num.gpus > 0) nrow(XX) else 0, verb = 1)
aGP.parallel(cls, XX, chunks = length(cls), X, Z, start = 6, end = 50, 
    d = NULL, g = 1/10000, method = c("alc", "alcray", "mspe", "nn", "fish"), 
    Xi.ret = TRUE, 
    close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)),
    numrays = ncol(X), num.gpus = 0, gpu.threads = num.gpus, 
    omp.threads = if (num.gpus > 0) 0 else 1,
    nn.gpu = if (num.gpus > 0) nrow(XX) else 0, verb = 1)
aGP.R(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000,
    method = c("alc", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE,
    close = min((1000+end) *if(method[1] == "alcray") 10 else 1, nrow(X)),
    numrays = ncol(X), laGP=laGP.R, verb = 1)
aGPsep(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000,
    method = c("alc", "alcray", "nn"), Xi.ret = TRUE,
    close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)),
    numrays = ncol(X),  omp.threads = 1, verb = 1)
aGPsep.R(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/10000,
    method = c("alc", "alcray", "nn"), Xi.ret = TRUE,
    close = min((1000+end)*if(method[1] == "alcray") 10 else 1, nrow(X)),
    numrays = ncol(X), laGPsep=laGPsep.R, verb = 1)
aGP.seq(X, Z, XX, d, methods=rep("alc", 2), M=NULL, ncalib=0, ...)

Arguments

a matrix or data.frame containing the full (large) design matrix of input locations

a vector of responses/dependent values with length(Z) = nrow(X)

a matrix or data.frame of out-of-sample predictive locations with ncol(XX) = ncol(X); aGP calls laGP for each row of XX as a value of Xref, independently

start

the number of Nearest Neighbor (NN) locations to start each independent call to laGP with; must have start >= 6

end

the total size of the local designs; start < end

a prior or initial setting for the lengthscale parameter in a Gaussian correlation function; a (default) NULL value triggers a sensible regularization (prior) and initial setting to be generated via darg; a scalar specifies an initial value, causing darg to only generate the prior; otherwise, a list or partial list matching the output of darg can be used to specify a custom prior. In the case of a partial list, only the missing entries will be generated. Note that a default/generated list specifies MLE/MAP inference for this parameter. When specifying initial values, a vector of length nrow(XX) can be provided, giving a different initial value for each predictive location. With aGPsep, the starting values can be an ncol(X)-by-nrow(XX) matrix or an ncol(X) vector

a prior or initial setting for the nugget parameter; a NULL value causes a sensible regularization (prior) and initial setting to be generated via garg; a scalar (default g = 1/10000) specifies an initial value, causing garg to only generate the prior; otherwise, a list or partial list matching the output of garg can be used to specify a custom prior. In the case of a partial list, only the missing entries will be generated. Note that a default/generated list specifies no inference for this parameter; i.e., it is fixed at its starting or default value, which may be appropriate for emulating deterministic computer code output. In such situations a value much smaller than the default may work even better (i.e., yield better out-of-sample predictive performance). The default was chosen conservatively. When specifying non-default initial values, a vector of length nrow(XX) can be provided, giving a different initial value for each predictive location

method

specifies the method by which end-start candidates from X are chosen in order to predict at each row XX independently. In brief, ALC ("alc", default) minimizes predictive variance; ALCRAY ("alcray") executes a thrifty search focused on rays emanating from the reference location(s); MSPE ("mspe") augments ALC with extra derivative information to minimize mean-squared prediction error (requires extra computation); NN ("nn") uses nearest neighbor; and ("fish") uses the expected Fisher information - essentially 1/G from Gramacy & Apley (2015) - which is global heuristic, i.e., not localized to each row of XX

methods

for aGP.seq this is a vectorized method argument, containing a list of valid methods to perform in sequence. When methods = FALSE a call to M is invoked instead; see below for more details

Xi.ret

a scalar logical indicating whether or not a matrix of indices into X, describing the chosen sub-design for each of the predictive locations in XX, should be returned on output

a non-negative integer end < close <= nrow(X) specifying the number of NNs (to each row XX) in X to consider when searching for the sub-design; close = 0 specifies all. For method="alcray" this specifies the scope used to snap ray-based solutions back to elements of X, otherwise there are no restrictions on that search

numrays

a scalar integer indicating the number of rays for each greedy search; only relevant when method="alcray". More rays leads to a more thorough, but more computationally intensive search

laGP

applicable only to the R-version aGP.R, this is a function providing the local design implementation to be used. Either laGP or laGP.R can be provided, or a bespoke routine providing similar outputs

laGPsep

applicable only to the R-version aGPsep.R, this is a function providing the local design implementation to be used. Either laGPsep or laGPsep.R can be provided, or a bespoke routine providing similar outputs

num.gpus

applicable only to the C-version aGP, this is a scalar positive integer indicating the number of GPUs available for calculating ALC (see alcGP); the package must be compiled for CUDA support; see README/INSTALL in the package source for more details

gpu.threads

applicable only to the C-version aGP; this is a scalar positive integer indicating the number of SMP (i.e., CPU) threads queuing ALC jobs on a GPU; the package must be compiled for CUDA support. If gpu.threads >= 2 then the package must also be compiled for OpenMP support; see README/INSTALL in the package source for more details. We recommend setting gpu.threads to up to two-times the sum of the number of GPU devices and CPU cores. Only method = "alc" is supported when using CUDA. If the sum of omp.threads and gpu.threads is bigger than the max allowed by your system, then that max is used instead (giving gpu.threads preference)

omp.threads

applicable only to the C-version aGP; this is a scalar positive integer indicating the number of threads to use for SMP parallel processing; the package must be compiled for OpenMP support; see README/INSTALL in the package source for more details. For most Intel-based machines, we recommend setting omp.threads to up to two-times the number of hyperthreaded cores. When using GPUs (num.gpu > 0), a good default is omp.threads=0, otherwise load balancing could be required; see nn.gpu below. If the sum of omp.threads and gpu.threads is bigger than the max allowed by your system, then that max is used instead (giving gpu.threads preference)

nn.gpu

a scalar non-negative integer between 0 and nrow(XX) indicating the number of predictive locations utilizing GPU ALC calculations. Note this argument is only useful when both gpu.threads and omp.threads are non-zero, whereby it acts as a load balancing mechanism

verb

a non-negative integer specifying the verbosity level; verb = 0 is quiet, and larger values cause more progress information to be printed to the screen. The value min(0,verb-1) is provided to each laGP call

cls

a cluster object created by makeCluster from the parallel or snow packages

chunks

a scalar integer indicating the number of chunks to break XX into for parallel evaluation on cluster cls. Usually chunks = length(cl) is appropriate. However specifying more chunks can be useful when the nodes of the cluster are not homogeneous

an optional function taking two matrix inputs, of ncol(X)-ncalib and ncalib columns respectively, which is applied in lieu of aGP. This can be useful for calibration where the computer model is cheap, i.e., does not require emulation; more details below

ncalib

an integer between 1 and ncol(X) indicating how to partition X and XX inputs into the two matrices required for M

...

other arguments passed from aGP.sep to aGP

Value

The output is a list with the following components.

mean

a vector of predictive means of length nrow(XX)

var

a vector of predictive variances of length nrow(Xref)

llik

a vector indicating the log likelihood/posterior probability of the data/parameter(s) under the chosen sub-design for each predictive location in XX; provided up to an additive constant

time

a scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation

method

a copy of the method argument

a full-list version of the d argument, possibly completed by darg

a full-list version of the g argument, possibly completed by garg

mle

if d$mle and/or g$mle are TRUE, then mle is a data.frame containing the values found for these parameters, and the number of required iterations, for each predictive location in XX

when Xi.ret = TRUE, this field contains a matrix of indices of length end into X indicating the sub-design chosen for each predictive location in XX

a copy of the input argument

The aGP.seq function only returns the output from the final aGP call. When methods = FALSE and M is supplied, the returned object is a data frame with a mean column indicating the output of the computer model run, and a var column, which at this time is zero

Details

This function invokes laGP with argument Xref = XX[i,] for each i=1:nrow(XX), building up local designs, inferring local correlation parameters, and obtaining predictive locations independently for each location. For more details see laGP.

The function aGP.R is a prototype R-only version for debugging and transparency purposes. It is slower than aGP, which is primarily in C. However it may be useful for developing new programs that involve similar subroutines. Note that aGP.R may provide different output than aGP due to differing library subroutines deployed in R and C.

The function aGP.parallel allows aGP to be called on segments of the XX matrix distributed to a cluster created by parallel. It breaks XX into chunks which are sent to aGP workers pointed to by the entries of cls. The aGP.parallel function collects the outputs from each chunk before returning an object almost identical to what would have been returned from a single aGP call. On a single (SMP) node, this represents is a poor-man's version of the OpenMP version described below. On multiple nodes both can be used.

If compiled with OpenMP flags, the independent calls to laGP will be farmed out to threads allowing them to proceed in parallel - obtaining nearly linear speed-ups. At this time aGP.R does not facilitate parallel computation, although a future version may exploit the parallel functionality for clustered parallel execution.

If num.gpus > 0 then the ALC part of the independent calculations performed by each thread will be offloaded to a GPU. If both gpu.threads >= 1 and omp.threads >= 1, some of the ALC calculations will be done on the GPUs, and some on the CPUs. In our own experimentation we have not found this to lead to large speedups relative to omp.threads = 0 when using GPUs. For more details, see Gramacy, Niemi, & Weiss (2014).

The aGP.sep function is provided primarily for use in calibration exercises, see Gramacy, et al. (2015). It automates a sequence of aGP calls, each with a potentially different method, successively feeding the previous estimate of local lengthscale (d) in as an initial set of values for the next call. It also allows the use of aGP to be bypassed, feeding the inputs into a user-supplied M function instead. This feature is enabled when methods = FALSE. The M function takes two matrices (same number of rows) as inputs, where the first ncol(X) - ncalib columns represent “field data” inputs shared by the physical and computer model (in the calibration context), and the remaining ncalib are the extra tuning or calibration parameters required to evalue the computer model. For examples illustrating aGP.seq please see the documentation file for discrep.est and demo("calib")

References

R.B. Gramacy (2016). laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R., Journal of Statistical Software, 72(1), 1-46; or see vignette("laGP")

R.B. Gramacy and D.W. Apley (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24(2), pp. 561-678; preprint on arXiv:1303.0383; http://arxiv.org/abs/1303.0383

R.B. Gramacy, J. Niemi, R.M. Weiss (2014). Massively parallel approximate Gaussian process regression. SIAM/ASA Journal on Uncertainty Quantification, 2(1), pp. 568-584; preprint on arXiv:1310.5182; http://arxiv.org/abs/1310.5182

R.B. Gramacy and B. Haaland (2016). Speeding up neighborhood search in local Gaussian process prediction. Technometrics, 58(3), pp. 294-303; preprint on arXiv:1409.0074 http://arxiv.org/abs/1409.0074

Examples

Run this code

# NOT RUN {
## first, a "computer experiment"

## Simple 2-d test function used in Gramacy & Apley (2014);
## thanks to Lee, Gramacy, Taddy, and others who have used it before
f2d <- function(x, y=NULL)
  {
    if(is.null(y)){
      if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2)
      y <- x[,2]; x <- x[,1]
    }
    g <- function(z)
      return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
    z <- -g(x)*g(y)
  }

## build up a design with N=~40K locations
x <- seq(-2, 2, by=0.02)
X <- expand.grid(x, x)
Z <- f2d(X)

## predictive grid with NN=400 locations,
## change NN to 10K (length=100) to mimic setup in Gramacy & Apley (2014)
## the low NN set here is for fast CRAN checks
xx <- seq(-1.975, 1.975, length=10)
XX <- expand.grid(xx, xx)
ZZ <- f2d(XX)

## get the predictive equations, first based on Nearest Neighbor
out <- aGP(X, Z, XX, method="nn", verb=0)
## RMSE
sqrt(mean((out$mean - ZZ)^2))

# }
# NOT RUN {
## refine with ALC
out2 <- aGP(X, Z, XX, method="alc", d=out$mle$d)
## RMSE
sqrt(mean((out2$mean - ZZ)^2))

## visualize the results
par(mfrow=c(1,3))
image(xx, xx, matrix(out2$mean, nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="predictive mean")
image(xx, xx, matrix(out2$mean-ZZ, nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="bias")
image(xx, xx, matrix(sqrt(out2$var), nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="sd")

## refine with MSPE
out3 <- aGP(X, Z, XX, method="mspe", d=out2$mle$d)
## RMSE
sqrt(mean((out3$mean - ZZ)^2))
# }
# NOT RUN {
## version with ALC-ray which is much faster than the ones not
## run above
out2r <- aGP(X, Z, XX, method="alcray", d=out$mle$d, verb=0)
sqrt(mean((out2r$mean - ZZ)^2))

## a simple example with estimated nugget
library(MASS)

## motorcycle data and predictive locations
X <- matrix(mcycle[,1], ncol=1)
Z <- mcycle[,2]
XX <- matrix(seq(min(X), max(X), length=100), ncol=1)

## first stage
out <- aGP(X=X, Z=Z, XX=XX, end=30, g=list(mle=TRUE), verb=0) 

## plot smoothed versions of the estimated parameters
par(mfrow=c(2,1))
df <- data.frame(y=log(out$mle$d), XX)
lo <- loess(y~., data=df, span=0.25)
plot(XX, log(out$mle$d), type="l")
lines(XX, lo$fitted, col=2)
dfnug <- data.frame(y=log(out$mle$g), XX)
lonug <- loess(y~., data=dfnug, span=0.25)
plot(XX, log(out$mle$g), type="l")
lines(XX, lonug$fitted, col=2)

## second stage design
out2 <- aGP(X=X, Z=Z, XX=XX, end=30, verb=0,
		d=list(start=exp(lo$fitted), mle=FALSE),
		g=list(start=exp(lonug$fitted)))

## plot the estimated surface
par(mfrow=c(1,1))
plot(X,Z)
df <- 20
s2 <- out2$var*(df-2)/df
q1 <- qt(0.05, df)*sqrt(s2) + out2$mean
q2 <- qt(0.95, df)*sqrt(s2) + out2$mean
lines(XX, out2$mean)
lines(XX, q1, col=1, lty=2)
lines(XX, q2, col=1, lty=2)

## compare to the single-GP result provided in the mleGP documentation
# }

Run the code above in your browser using DataLab