aGP: Localized Approximate GP Regression For Many Predictive Locations

Description

Provides localized GP inference and prediction at a large set of predictive locations, by (essentially) calling laGP at each location, and returning the moments of the predictive equations thus obtained

Usage

aGP(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/1000,
    method = c("alc", "mspe", "nn", "efi"), Xi.ret = TRUE,
    close = min(1000, nrow(X)), num.gpus = 0, 
    gpu.threads = num.gpus, omp.threads = if (num.gpus > 0) 0 else 1,
    nn.gpu = if (num.gpus > 0) nrow(XX) else 0, verb = 1)
aGP.snow(cls, XX, chunks = length(cls), X, Z, start = 6, end = 50, 
    d = NULL, g = 1/1000, method = c("alc", "mspe", "nn", "efi"), 
    Xi.ret = TRUE, close = min(1000, nrow(X)), num.gpus = 0, 
    gpu.threads = num.gpus, omp.threads = if (num.gpus > 0) 0 else 1,
    nn.gpu = if (num.gpus > 0) nrow(XX) else 0, verb = 1)
aGP.R(X, Z, XX, start = 6, end = 50, d = NULL, g = 1/1000,
      method = c("alc", "mspe", "nn", "efi"), Xi.ret = TRUE,
      close = min(1000, nrow(X)), laGP=laGP.R, verb = 1)

Arguments

a matrix or data.frame containing the full (large) design matrix of input locations

a vector of responses/dependent values with length(Z) = ncol(X)

a matrix of predictive locations with

ncol(XX) =
      ncol(X)

; aGP calls laGP for each row of XX as a value of Xref, independently

start

the number of Nearest Neighbor (NN) locations to start each independent call to laGP with; must have start >= 6

end

the total size of the local designs; start < end

a prior or initial setting for the (single/isotropic) lengthscale parameter in a Gaussian correlation function; a (default) NULL value triggers a sensible regularization (prior) and initial setting to be generated via

a prior or initial setting for the nugget parameter; a NULL value causes a sensible regularization (prior) and initial setting to be generated via garg; a scalar (default g = 1/

method

Specifies the method by which end-start candidates from X are chosen in order to predict at each row XX independently. In brief, ALC ("alc", default) minimizes predictive variance; MSPE (

Xi.ret

A scalar logical indicating whether or not a matrix of indices into X, describing the chosen sub-design for each of the predictive locations in XX, should be returned on output

a non-negative integer specifying the number of NNs (to each row XX) in X to consider when searching for the sub-design; close = 0 specifies all

laGP

applicable only to the R-version aGP.R, this is a function providing the local design implementation to be used. Either laGP or laGP.R can b

num.gpus

applicable only to the C-version aGP, this is a scalar positive integer indicating the number of GPUs available for calculating ALC (see alcGP); the package must be compiled for CUDA suppor

gpu.threads

applicable only to the C-version aGP; this is a scalar positive integer indicating the number of SMP (i.e., CPU) threads. If gpu.threads >= 2 then the package must also be compiled for OpenMP support; see README/

omp.threads

applicable only to the C-version aGP, this is a scalar positive integer indicating the number of threads to use for SMP parallel processing; the package must be compiled for OpenMP support; see README/INSTALL in the package source f

nn.gpu

a scalar non-negative integer between 0 and nrow(XX) indicating the number of predictive locations utilizing GPU ALC calculations. Note this argument is only useful when both gpu.threads and omp.

verb

a positive integer specifying the verbosity level; verb = 0 is quiet, and larger values cause more progress information to be printed to the screen. The value min(0,verb-1) is provided to each laGP call

cls

a cluster object created by makeCluster from the snow or parallel packages

chunks

a scalar integer indicating the number of chunks to break XX into for snow/parallel evaluation on a cluster cls. Usually chunks = length(cl) is appropriate. However slecifying mo

Value

The output is a list with the following components.
meana vector of predictive means of length nrow(XX)
vara vector of predictive variances of length nrow(Xref)
llika vector indicating the log likelihood/posterior probability of the data/parameter(s) under the chosen sub-design for each predictive location in XX; provided up to an additive constant
timea scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation
methoda copy of the method argument
da full-list version of the d argument, possibly completed by darg
ga full-list version of the g argument, possibly completed by garg
mleif d$mle and/or g$mle are TRUE, then mle is a data.frame containing the values found for these parameters, and the number of required iterations, for each predictive location in XX
Xiwhen Xi.ret = TRUE, this field contains a matrix of indices of length end into X indicating the sub-design chosen for each predictive location in XX

Details

This function invokes laGP with

Xref
  = XX[i,]

for i=1:nrow(XX), building up a local design, inferring correlation parameters, and obtaining predictive locations independently for each location. For more details see laGP.

The function aGP.R is a prototype R-only version for debugging and transparency purposes. It is slower than aGP, which is primarily in C. However it may be useful for developing new programs that involve similar subroutines.

The function aGP.snow allows aGP to be called on segments of the XX matrix distributed to cluster created by snow or parallel. It breaks XX into chunks which are sent to aGP workers pointed to by the entries of cls. The aGP.snow function collects the outputs from each chunk before returning an object almost identical to what would have been returned from a single aGP call. On a single (SMP) node, this represents is a poor-man's version of the OpenMP version described below. On multiple nodes both can be used.

If compiled with OpenMP flags, the independent calls to laGP will be farmed out to threads allowing them to proceed in parallel - obtaining nearly linear speed-ups. At this time aGP.R does not facilitate parallel computation, although a future version may exploit the snow/parallel functionality for clustered parallel execution.

If num.gpus > 0 then the the ALC part of the independent calculations performed by each thread will be offloaded to a GPU. If both gpu.threads >= 1 and omp.threads >= 1, some of the ALC calculations will be done on the GPUs, and some on the CPUs. In our own experimentation we have not found this to lead to large speedups relative to omp.threads = 0 when using GPUs

References

R.B. Gramacy and D.W. Apley (2013). Local Gaussian process approximation for large computer experiments. Preprint available on arXiv:1303.0383; http://arxiv.org/abs/1303.0383

Examples

Run this code

## first, a "computer experiment"

## Simple 2-d test function used in Gramacy & Apley (2013);
## thanks to Lee, Gramacy, Taddy, and others who have used it before
f2d <- function(x, y=NULL)
  {
    if(is.null(y)) {
      if(!is.matrix(x)) x <- matrix(x, ncol=2)
      y <- x[,2]; x <- x[,1]
    }
    g <- function(z)
      return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1)))
    z <- -g(x)*g(y)
  }

## build up a design with N=~40K locations
x <- seq(-2, 2, by=0.02)
X <- as.matrix(expand.grid(x, x))
Z <- f2d(X)

## predictive grid with NN=400 locations,
## change NN to 10K (length=100) to mimic setup in Gramacy & Apley (2013)
## recommend NN >= 400 (length=20) for pretty results below;
## the low NN set here is for fast CRAN checks
xx <- seq(-1.975, 1.975, length=10)
XX <- as.matrix(expand.grid(xx, xx))
ZZ <- f2d(XX)

## get the predictive equations, first based on Nearest Neighbor
out <- aGP(X, Z, XX, method="nn", verb=0)
## RMSE
sqrt(mean((out$mean - ZZ)^2))

## refine with ALC
out2 <- aGP(X, Z, XX, method="alc", d=out$mle$d)
## RMSE
sqrt(mean((out2$mean - ZZ)^2))

## visualize the results
par(mfrow=c(1,3))
image(xx, xx, matrix(out2$mean, nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="predictive mean")
image(xx, xx, matrix(out2$mean-ZZ, nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="bias")
image(xx, xx, matrix(sqrt(out2$var), nrow=length(xx)), col=heat.colors(128),
      xlab="x1", ylab="x2", main="sd")

## refine with MSPE
out3 <- aGP(X, Z, XX, method="mspe", d=out2$mle$d)
## RMSE
sqrt(mean((out3$mean - ZZ)^2))

## a simple example with estimated nugget
library(MASS)

## motorcycle data and predictive locations
X <- matrix(mcycle[,1], ncol=1)
Z <- mcycle[,2]
XX <- matrix(seq(min(X), max(X), length=100), ncol=1)

## first stage
out <- aGP(X=X, Z=Z, XX=XX, end=30, g=list(mle=TRUE), verb=0) 

## plot smoothed versions of the estimated parameters
par(mfrow=c(2,1))
df <- data.frame(y=log(out$mle$d), XX)
lo <- loess(y~., data=df, span=0.25)
plot(XX, log(out$mle$d), type="l")
lines(XX, lo$fitted, col=2)
dfnug <- data.frame(y=log(out$mle$g), XX)
lonug <- loess(y~., data=dfnug, span=0.25)
plot(XX, log(out$mle$g), type="l")
lines(XX, lonug$fitted, col=2)

## second stage design
out2 <- aGP(X=X, Z=Z, XX=XX, end=30, verb=0,
		d=list(start=exp(lo$fitted), mle=FALSE),
		g=list(start=exp(lonug$fitted)))

## plot the estimated surface
par(mfrow=c(1,1))
plot(X,Z)
df <- 20
s2 <- out2$var*(df-2)/df
q1 <- qt(0.05, df)*sqrt(s2) + out2$mean
q2 <- qt(0.95, df)*sqrt(s2) + out2$mean
lines(XX, out2$mean)
lines(XX, q1, col=1, lty=2)
lines(XX, q2, col=1, lty=2)

## compare to the single-GP result provided in the mleGP documentation

Run the code above in your browser using DataLab