spPredict: Function for prediction at new locations using `spNNGP` models.

Description

The function spPredict collects posterior predictive samples for a set of new locations given a spNNGP object.

Usage

spPredict(sp.obj, X.0, coords.0, start=1, end, thin=1, n.omp.threads = 1,
          verbose=TRUE, n.report=100, ...)

Arguments

sp.obj

an object returned by spNNGP.

X.0

the design matrix for prediction locations. An intercept should be provided in the first column if one is specified in sp.obj model.

coords.0

the spatial coordinates corresponding to X.0.

start

specifies the first sample included in the composition sampling.

end

specifies the last sample included in the composition. The default is to use all posterior samples in sp.obj.

thin

a sample thinning factor. The default of 1 considers all samples between start and end. For example, if thin = 10 then 1 in 10 samples are considered between start and end.

n.omp.threads

a positive integer indicating the number of threads to use for SMP parallel processing. The package must be compiled for OpenMP support. For most Intel-based machines, we recommend setting n.omp.threads up to the number of hyperthreaded cores.

verbose

if TRUE, model specification and progress of the sampler is printed to the screen. Otherwise, nothing is printed to the screen.

n.report

the interval to report sampling progress.

...

currently no additional arguments.

Value

A list comprising:

p.y.0

a matrix that holds the response variable posterior predictive samples where rows are locations corresponding to coords.0 and columns are samples.

p.w.0

a matrix that holds the random effect posterior predictive samples where rows are locations corresponding to coords.0 and columns are samples. This is only returned if spNNGP method = "sequential".

run.time

execution time reported using proc.time().

References

Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111:800-812.

Finley, A.O., A. Datta, B.C. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee (2017) Applying Nearest Neighbor Gaussian Processes to massive spatial data sets: Forest canopy height prediction across Tanana Valley Alaska, https://arxiv.org/abs/1702.00434v2.

Examples

Run this code

# NOT RUN {
rmvn <- function(n, mu=0, V = matrix(1)){
  p <- length(mu)
  if(any(is.na(match(dim(V),p))))
    stop("Dimension problem!")
  D <- chol(V)
  t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p)))
}

##Make some data
set.seed(1)
n <- 100
coords <- cbind(runif(n,0,1), runif(n,0,1))

x <- cbind(1, rnorm(n))

B <- as.matrix(c(1,5))

sigma.sq <- 5
tau.sq <- 1
phi <- 3/0.5

D <- as.matrix(dist(coords))
R <- exp(-phi*D)
w <- rmvn(1, rep(0,n), sigma.sq*R)
y <- rnorm(n, x%*%B + w, sqrt(tau.sq))

ho <- sample(1:n, 50)

y.ho <- y[ho]
x.ho <- x[ho,,drop=FALSE]
w.ho <- w[ho]
coords.ho <- coords[ho,]

y <- y[-ho]
x <- x[-ho,,drop=FALSE]
w <- w[-ho,,drop=FALSE]
coords <- coords[-ho,]

##Fit a Response and Sequential NNGP model
n.samples <- 500

starting <- list("phi"=phi, "sigma.sq"=5, "tau.sq"=1)

tuning <- list("phi"=0.5, "sigma.sq"=0.5, "tau.sq"=0.5)

priors <- list("phi.Unif"=c(3/1, 3/0.01), "sigma.sq.IG"=c(2, 5), "tau.sq.IG"=c(2, 1))

cov.model <- "exponential"

n.report <- 500

##Predict for holdout set using both models
m.s <- spNNGP(y~x-1, coords=coords, starting=starting, method="sequential", n.neighbors=10,
              tuning=tuning, priors=priors, cov.model=cov.model,
              n.samples=n.samples, n.omp.threads=2, n.report=n.report)

m.r <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10,
              tuning=tuning, priors=priors, cov.model=cov.model,
              n.samples=n.samples, n.omp.threads=2, n.report=n.report)

##Prediction for holdout data
p.s <- spPredict(m.s, X.0 = x.ho, coords.0 = coords.ho, n.omp.threads=2)

plot(apply(p.s$p.w.0, 1, mean), w.ho)
plot(apply(p.s$p.y.0, 1, mean), y.ho)

p.r <- spPredict(m.r, X.0 = x.ho, coords.0 = coords.ho, n.omp.threads=2)

points(apply(p.r$p.y.0, 1, mean), y.ho, pch=19, col="blue")

# }

Run the code above in your browser using DataLab