Build a sub-design of `X`

of size `end`

, and infer parameters,
for approximate Gaussian process prediction at reference location(s)
`Xref`

. Return the moments of those predictive equations, and indices
into the local design

```
laGP(Xref, start, end, X, Z, d = NULL, g = 1/10000,
method = c("alc", "alcopt", "alcray", "mspe", "nn", "fish"), Xi.ret = TRUE,
close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)),
alc.gpu = FALSE, numstart = if(method[1] == "alcray") ncol(X) else 1,
rect = NULL, lite = TRUE, verb = 0)
laGP.R(Xref, start, end, X, Z, d = NULL, g = 1/10000,
method = c("alc", "alcopt", "alcray", "mspe", "nn", "fish"),
Xi.ret = TRUE, pall = FALSE,
close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)),
parallel = c("none", "omp", "gpu"),
numstart = if(method[1] == "alcray") ncol(X) else 1,
rect = NULL, lite = TRUE, verb = 0)
laGPsep(Xref, start, end, X, Z, d = NULL, g = 1/10000,
method = c("alc", "alcopt", "alcray", "nn"), Xi.ret = TRUE,
close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)),
alc.gpu = FALSE, numstart = if(method[1] == "alcray") ncol(X) else 1,
rect = NULL, lite = TRUE, verb=0)
laGPsep.R(Xref, start, end, X, Z, d = NULL, g = 1/10000,
method = c("alc", "alcopt", "alcray", "nn"),
Xi.ret = TRUE, pall = FALSE,
close = min((1000+end)*if(method[1] %in% c("alcray", "alcopt")) 10 else 1, nrow(X)),
parallel = c("none", "omp", "gpu"),
numstart = if(method[1] == "alcray") ncol(X) else 1,
rect = NULL, lite = TRUE, verb = 0)
```

Xref

a vector of length `ncol(X)`

containing a single reference
location; or a `matrix`

with `ncol(Xref) = ncol(X)`

containing
multiple reference locations (unless `method = "alcray"`

)
for simultaneous sub-design and prediction

start

the number of Nearest Neighbor (NN) locations for initialization; must
specify `start >= 6`

end

the total size of the local designs; must have `start < end`

X

a `matrix`

or `data.frame`

containing
the full (large) design matrix of input locations

Z

a vector of responses/dependent values with `length(Z) = nrow(X)`

d

a prior or initial setting for the lengthscale
parameter for a Gaussian correlation function; a (default)
`NULL`

value causes a sensible regularization (prior) and
initial setting to be generated via `darg`

;
a scalar specifies an initial value, causing `darg`

to only generate the prior; otherwise, a list or partial list matching the output
of `darg`

can be used to specify a custom prior. In the
case of a partial list, the only the missing entries will be
generated. Note that a default/generated list specifies MLE/MAP
inference for this parameter. With
`laGPsep`

, the starting values can be an
`ncol(X)`

-by-`nrow(XX)`

`matrix`

or `ncol(X)`

vector

g

a prior or initial setting for the nugget parameter; a
`NULL`

value causes a sensible regularization (prior) and
initial setting to be generated via `garg`

; a scalar
(default `g = 1/10000`

) specifies an initial value,
causing `garg`

to only generate the prior; otherwise, a
list or partial list matching the output of `garg`

can be used to specify a custom prior. In the case of a partial list,
only the missing entries will be generated. Note that a
default/generated list specifies *no* inference for this
parameter; i.e., it is fixed at its starting or default value,
which may be appropriate for emulating
deterministic computer code output. In such situations a
value much smaller than the default may work even better (i.e.,
yield better out-of-sample predictive performance).
The default was chosen conservatively

method

Specifies the method by which `end-start`

candidates from
`X`

are chosen in order to predict at `Xref`

. In brief, ALC
(`"alc"`

, default) minimizes predictive variance; ALCRAY
(`"alcray")`

) executes a thrifty ALC-based search focused on
rays emanating from the reference location [must have `nrow(Xref) = 1`

];
ALCOPT (`"alcopt"`

) optimizes a continuous ALC analog via derivatives to
and snaps the solution back to the candidate grid;
MSPE (`"mspe"`

) augments ALC
with extra derivative information to minimize mean-squared prediction
error (requires extra computation);
NN (`"nn"`

) uses nearest neighbor; and EFI (`"fish"`

) uses
the expected Fisher information - essentially `1/G`

from
Gramacy & Apley (2015) - which is global heuristic, i.e., not
localized to `Xref`

.

Xi.ret

A scalar logical indicating whether or not a vector of indices
into `X`

, specifying the chosen sub-design, should be returned on
output

pall

a scalar logical (for `laGP.R`

only) offering the
ability to obtain predictions after every update (for progress
indication and debugging), rather than after just the last update

close

a non-negative integer `end < close <= nrow(X)`

specifying the number of NNs (to `Xref`

) in
`X`

to consider when searching for elements of the sub-design;
`close = 0`

specifies all. For `method="alcray"`

and
`method="alcopt"`

, this argument specifies the scope used to snap
solutions obtained via analog continuous searches back to elements of `X`

,
otherwise there are no restrictions on those searches. Since these
approximate searches are cheaper, they can afford a larger
“snapping scope” hence the larger default

alc.gpu

a scalar `logical`

indicating if a GPU should be used to
parallelize the evaluation of the ALC criteria (`method = "alc"`

).
Requires the package be compiled with CUDA flags; see README/INSTALL
in the package source for more details; currently only available
for `nrow(Xref) == 1`

via `laGP`

, not `laGPsep`

or the `.R`

variants,
and only supports off-loading ALC calculation to the GPU

parallel

a switch indicating if any parallel calculation of
the criteria is desired. Currently parallelization at this level is only
provided for option `method = "alc"`

).
For `parallel = "omp"`

, the package must be compiled with OMP flags;
for `parallel = "gpu"`

, the package must be compiled with CUDA
flags see README/INSTALL
in the package source for more details; currently only available
via `laGP.R`

numstart

a scalar integer indicating the number of rays for each
greedy search when `method="alcray"`

or the number of restarts
when `method="alcopt"`

. More rays or restarts
leads to a more thorough, but more computational intensive search.
This argument is not involved in other methods

rect

an optional `2`

-by-`ncol(X)`

`matrix`

describing a bounding
rectangle for `X`

that is used by the `"alcray"`

method.
If not specified, the rectangle is calculated from `range`

applied
to the columns of `X`

lite

Similar to the `predGP`

option of the same name,
this argument specifies whether (`TRUE`

, the default) or not (`FALSE`

) to return
a full covariance structure is returned, as opposed the diagonal only. A full covariance
structure requires more computation and more storage. This option is
only relevant when `nrow(Xref) > 1`

verb

a non-negative integer specifying the verbosity level; `verb = 0`

is quiet, and larger values cause more progress information to be
printed to the screen

The output is a `list`

with the following components.

a vector of predictive means of length `nrow(Xref)`

a vector of Student-t scale parameters of length
`nrow(Xref)`

a Student-t degrees of freedom scalar (applies to all
`Xref`

)

a scalar indicating the maximized log likelihood or log posterior probability of the data/parameter(s) under the chosen sub-design; provided up to an additive constant

a scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation

a copy of the `method`

argument

a full-list version of the `d`

argument, possibly completed by `darg`

a full-list version of the `g`

argument, possibly
completed by `garg`

if `d$mle`

and/or `g$mle`

are `TRUE`

, then
`mle`

is a `data.frame`

containing the values found for
these parameters, and the number of required iterations

when `Xi.ret = TRUE`

, this field contains a vector of
indices of length `end`

into `X`

indicating the sub-design chosen

a copy of the input argument

A sub-design of `X`

of size `end`

is built-up according to
the criteria prescribed by the `method`

and then used to predict at
`Xref`

. The first `start`

locations are NNs in order to
initialize the first GP, via `newGP`

or `newGPsep`

,
and thereby initialize the
search. Each subsequent addition is found via the chosen criterion
(`method`

), and the GP fit is updated via `updateGP`

or `updateGPsep`

The runtime is cubic in `end`

, although
the multiplicative “constant” out front depends on the
`method`

chosen, the size of the design `X`

, and
`close`

. The `"alcray"`

method has a smaller constant
since it does not search over all candidates exhaustively.

After building the sub-design, local MLE/MAP lengthscale (and/or
nugget) parameters are estimated, depending on the `d`

and
`g`

arguments supplied. This is facilitated by calls to
`mleGP`

or `jmleGP`

.

Finally `predGP`

is called on the resulting local GP
model, and the parameters of the resulting Student-t distribution(s)
are returned. Unless `Xi.ret = FALSE`

, the indices of the
local design are also returned.

`laGP.R`

and `laGPsep.R`

are a prototype R-only version for
debugging and transparency purposes. They are slower than
`laGP`

and `laGPsep`

, which are primarily in C, and may not
provide identical output in all cases due to differing library implementations
used as subroutines; see note below for an example. `laGP.R`

and other
`.R`

functions in the package may be useful for developing new programs
that involve similar subroutines. The current version of `laGP.R`

allows OpenMP and/or GPU parallelization of the criteria (`method`

) if
the package is compiled with the appropriate flags. See README/INSTALL in
the package source for more information. For algorithmic details, see
Gramacy, Niemi, & Weiss (2014)

F. Sun, R.B. Gramacy, B. Haaland, E. Lawrence, and A. Walker (2019).
*Emulating satellite drag from large simulation experiments*,
SIAM/ASA Journal on Uncertainty Quantification, 7(2), pp. 720-759;
preprint on arXiv:1712.00182;
http://arxiv.org/abs/1712.00182

R.B. Gramacy (2016). *laGP: Large-Scale Spatial Modeling via
Local Approximate Gaussian Processes in R.*, Journal of Statistical
Software, 72(1), 1-46; or see `vignette("laGP")`

R.B. Gramacy and B. Haaland (2016).
*Speeding up neighborhood search in local Gaussian process prediction.*
Technometrics, 58(3), pp. 294-303;
preprint on arXiv:1409.0074
http://arxiv.org/abs/1409.0074

R.B. Gramacy and D.W. Apley (2015).
*Local Gaussian process approximation for large computer
experiments.* Journal of Computational and Graphical Statistics,
24(2), pp. 561-678; preprint on arXiv:1303.0383;
http://arxiv.org/abs/1303.0383

R.B. Gramacy, J. Niemi, R.M. Weiss (2014).
*Massively parallel approximate Gaussian process regression.*
SIAM/ASA Journal on Uncertainty Quantification, 2(1), pp. 568-584;
preprint on arXiv:1310.5182;
http://arxiv.org/abs/1310.5182

`vignette("laGP")`

,
`aGP`

, `newGP`

, `updateGP`

,
`predGP`

, `mleGP`

, `jmleGP`

,
`alcGP`

, `mspeGP`

, `alcrayGP`

,
`randLine`

## path-based local prediction via `laGP`

# NOT RUN { ## examining a particular laGP call from the larger analysis provided ## in the aGP documentation ## A simple 2-d test function used in Gramacy & Apley (2014); ## thanks to Lee, Gramacy, Taddy, and others who have used it before f2d <- function(x, y=NULL) { if(is.null(y)) { if(!is.matrix(x) && !is.data.frame(x)) x <- matrix(x, ncol=2) y <- x[,2]; x <- x[,1] } g <- function(z) return(exp(-(z-1)^2) + exp(-0.8*(z+1)^2) - 0.05*sin(8*(z+0.1))) z <- -g(x)*g(y) } ## build up a design with N=~40K locations x <- seq(-2, 2, by=0.02) X <- as.matrix(expand.grid(x, x)) Z <- f2d(X) ## optional first pass of nearest neighbor Xref <- matrix(c(-1.725, 1.725), nrow=TRUE) out <- laGP(Xref, 6, 50, X, Z, method="nn") ## second pass via ALC, ALCOPT, MSPE, and ALC-ray respectively, ## conditioned on MLE d-values found above out2 <- laGP(Xref, 6, 50, X, Z, d=out$mle$d) out2.alcopt <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="alcopt") out2.mspe <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="mspe") out2.alcray <- laGP(Xref, 6, 50, X, Z, d=out2$mle$d, method="alcray") ## look at the different designs plot(rbind(X[out2$Xi,], X[out2.mspe$Xi,]), type="n", xlab="x1", ylab="x2", main="comparing local designs") points(Xref[1], Xref[2], col=2, cex=0.5) text(X[out2$Xi,], labels=1:50, cex=0.6) text(X[out2.alcopt$Xi,], labels=1:50, cex=0.6, col="forestgreen") text(X[out2.mspe$Xi,], labels=1:50, cex=0.6, col="blue") text(X[out2.alcray$Xi,], labels=1:50, cex=0.6, col="red") legend("right", c("ALC", "ALCOPT", "MSPE", "ALCRAY"), text.col=c("black", "forestgreen", "blue", "red"), bty="n") ## compare computational time c(nn=out$time, alc=out2$time, alcopt=out2.alcopt$time, mspe=out2.mspe$time, alcray=out2.alcray$time) # } # NOT RUN { ## Joint path sampling: a comparison between ALC-ex, ALC-opt and NN ## defining a predictive path wx <- seq(-0.85, 0.45, length=100) W <- cbind(wx-0.75, wx^3+0.51) ## three comparators from Sun, et al. (2017) ## larger-than-default "close" argument to capture locations nearby path p.alc <- laGPsep(W, 6, 100, X, Z, close=10000, lite=FALSE) p.alcopt <- laGPsep(W, 6, 100, X, Z, method="alcopt", lite=FALSE) ## note that close=10*(1000+end) would be the default for method = "alcopt" p.nn <- laGPsep(W, 6, 100, X, Z, method="nn", close=10000, lite=FALSE) ## time comparison c(alc=p.alc$time, alcopt=p.alcopt$time, nn=p.nn$time) ## visualization plot(W, type="l", xlab="x1", ylab="x2", xlim=c(-2.25,0), ylim=c(-0.75,1.25), lwd=2) points(X[p.alc$Xi,], col=2, cex=0.6) lines(W[,1]+0.25, W[,2]-0.25, lwd=2) points(X[p.nn$Xi,1]+0.25, X[p.nn$Xi,2]-0.25, pch=22, col=3, cex=0.6) lines(W[,1]-0.25, W[,2]+0.25, lwd=2) points(X[p.alcopt$Xi,1]-0.25, X[p.alcopt$Xi,2]+0.25, pch=23, col=4, cex=0.6) legend("bottomright", c("ALC-opt", "ALC-ex", "NN"), pch=c(22, 21, 23), col=c(4,2,3)) # }