spNNGP: Function for Fitting Univariate Bayesian Spatial Regression Models

Description

The function spNNGP fits Gaussian and non-Gaussian univariate Bayesian spatial regression models using Nearest Neighbor Gaussian Processes (NNGP).

Usage

spNNGP(formula, data = parent.frame(), coords, method = "response",
      family="gaussian", weights, n.neighbors = 15, 
      starting, tuning, priors, cov.model = "exponential",
      n.samples, n.omp.threads = 1, search.type = "cb", ord,
      return.neighbor.info = FALSE, neighbor.info,
      fit.rep = FALSE, sub.sample, verbose = TRUE, n.report = 100, ...)

Arguments

formula

a symbolic description of the regression model to be fit. See example below.

data

an optional data frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which spNNGP is called.

coords

an \(n \times 2\) matrix of the observation coordinates in \(R^2\) (e.g., easting and northing), or if data is a data frame then coords can be a vector of length two comprising coordinate column names or indices. There can be no duplicate locations.

method

a quoted keyword that specifies the NNGP sampling algorithm. Supported method keywords are: "response" and "latent". When \(n\) is large, the "response" algorithm should be faster and provide finer control over Metropolis acceptance rate for covariance parameters. In general, unless estimates of spatial random effects are needed, the "response" algorithm should be used.

family

a quoted keyword that specifies the data likelihood. Choices are "gaussian" for continuous outcome and "binomial" for discrete outcome which assumes a logistic link modeled using Polya-Gamma latent variables.

weights

specifies the number of trials for each observation when family="binomial". The default is 1 trial for each observation. Valid arguments are a scalar value that specifies the number of trials used if all observations have the same number of trials, and a vector of length \(n\) that specifies the number of trials for each observation used there are differences in the number of trials.

n.neighbors

number of neighbors used in the NNGP.

starting

a list with each tag corresponding to a parameter name. Valid tags are beta, sigma.sq, tau.sq, phi, and nu. nu is only specified if cov.model="matern". The value portion of each tag is the parameter's startingvalue.

tuning

a list with each tag corresponding to a parameter name. Valid tags are sigma.sq, tau.sq, phi, and nu. If method="latent" then only phi and nu need to be specified. The value portion of each tag defines the variance of the Metropolis sampler Normal proposal distribution.

priors

a list with each tag corresponding to a parameter name. Valid tags are sigma.sq.ig, tau.sq.ig, phi.unif, and nu.unif. Variance parameters, simga.sq and tau.sq, are assumed to follow an inverse-Gamma distribution, whereas the spatial decay phi and smoothness nu parameters are assumed to follow Uniform distributions. The hyperparameters of the inverse-Gamma are passed as a vector of length two, with the first and second elements corresponding to the shape and scale, respectively. The hyperparameters of the Uniform are also passed as a vector of length two with the first and second elements corresponding to the lower and upper support, respectively.

cov.model

a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the observations. Supported covariance model key words are: "exponential", "matern", "spherical", and "gaussian". See below for details.

n.samples

the number of posterior samples to collect.

n.omp.threads

a positive integer indicating the number of threads to use for SMP parallel processing. The package must be compiled for OpenMP support. For most Intel-based machines, we recommend setting n.omp.threads up to the number of hyperthreaded cores. Note, n.omp.threads > 1 might not work on some systems.

fit.rep

if TRUE, regression fitted and replicate data will be returned. The argument sub.sample controls which MCMC samples are used to generate the fitted and replicated data.

sub.sample

an optional list that specifies the samples used for fit.rep. Valid tags are start, end, and thin. Given the value associated with the tags, the sample subset is selected using seq(as.integer(start), as.integer(end), by=as.integer(thin)). The default values are start=floor(0.5*n.samples), end=n.samples and thin=1.

verbose

if TRUE, model specification and progress of the sampler is printed to the screen. Otherwise, nothing is printed to the screen.

search.type

a quoted keyword that specifies type of nearest neighbor search algorithm. Supported method key words are: "cb" and "brute". The "cb" should generally be much faster. If locations do not have identical coordinate values on the axis used for the nearest neighbor ordering (see ord argument) then "cb" and "brute" should produce identical neighbor sets. However, if there are identical coordinate values on the axis used for nearest neighbor ordering, then "cb" and "brute" might produce different, but equally valid, neighbor sets, e.g., if data are on a grid.

ord

an index vector of length \(n\) used for the nearest neighbor search. Internally, this vector is used to order coords, i.e., coords[ord,], and associated data. Nearest neighbor candidates for the i-th row in the ordered coords are rows 1:(i-1), with the n.neighbors nearest neighbors being those with the minimum euclidean distance to the location defined by ordered coords[i,]. The default used when ord is not specified is x-axis ordering, i.e., order(coords[,1]). This argument should typically be left blank. This argument will be ignored if the neighbor.info argument is used.

return.neighbor.info

if TRUE, a list called neighbor.info containing several data structures used for fitting the NNGP model is returned. If there is no change in input data or n.neighbors, this list can be passed to subsequent spNNGP calls via the neighbor.info argument to avoid the neighbor search, which can be time consuming if \(n\) is large. In addition to the several cryptic data structures in neighbor.info there is a list called n.indx that is of length \(n\). The i-th element in n.indx corresponds to the i-th row in coords[ord,] and holds the vector of that location's nearest neighbor indices. This list can be useful for plotting the neighbor graph if desired.

neighbor.info

see the return.neighbor.info argument description above.

n.report

the interval to report Metropolis sampler acceptance and MCMC progress.

...

currently no additional arguments.

Value

An object of class NNGP with additional class designations for method and family. The return object is a list comprising:

p.beta.samples

a coda object of posterior samples for the regression coefficients.

p.theta.samples

a coda object of posterior samples for covariance parameters.

p.w.samples

is a matrix of posterior samples for the spatial random effects, where rows correspond to locations in coords and columns hold the n.samples posterior samples. This is only returned if method="latent".

y.hat.samples

if fit.rep=TRUE, regression fitted values from posterior samples specified using sub.sample. See additional details below.

y.hat.quants

if fit.rep=TRUE, 0.5, 0.025, and 0.975 quantiles of the y.hat.samples.

y.rep.samples

if fit.rep=TRUE, replicated outcome from posterior samples specified using sub.sample. See additional details below.

y.rep.quants

if fit.rep=TRUE, 0.5, 0.025, and 0.975 quantiles of the y.rep.samples.

s.indx

if fit.rep=TRUE, the subset index specified with sub.sample.

neighbor.info

returned if return.neighbor.info=TRUE see the return.neighbor.info argument description above.

run.time

execution time for parameter estimation reported using proc.time(). This time does not include nearest neighbor search time for building the neighbor set.

The return object will include additional objects used for subsequent prediction and/or model fit evaluation.

Details

Model parameters can be fixed at their starting values by setting their tuning values to zero.

The no nugget model is specified by setting tau.sq to zero in the starting and tuning lists.

References

Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 10.1080/01621459.2015.1044091.

Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, 10.1080/10618600.2018.1537924.

Examples

Run this code

# NOT RUN {
rmvn <- function(n, mu=0, V = matrix(1)){
  p <- length(mu)
  if(any(is.na(match(dim(V),p))))
    stop("Dimension problem!")
  D <- chol(V)
  t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p)))
}

##Make some data
set.seed(1)
n <- 100
coords <- cbind(runif(n,0,1), runif(n,0,1))

x <- cbind(1, rnorm(n))

B <- as.matrix(c(1,5))

sigma.sq <- 5
tau.sq <- 1
phi <- 3/0.5

D <- as.matrix(dist(coords))
R <- exp(-phi*D)
w <- rmvn(1, rep(0,n), sigma.sq*R)
y <- rnorm(n, x%*%B + w, sqrt(tau.sq))

##Fit a Response and Latent NNGP model
n.samples <- 500

starting <- list("phi"=phi, "sigma.sq"=5, "tau.sq"=1)

tuning <- list("phi"=0.5, "sigma.sq"=0.5, "tau.sq"=0.5)

priors <- list("phi.Unif"=c(3/1, 3/0.01), "sigma.sq.IG"=c(2, 5), "tau.sq.IG"=c(2, 1))

cov.model <- "exponential"

m.s <- spNNGP(y~x-1, coords=coords, starting=starting, method="latent", n.neighbors=10,
              tuning=tuning, priors=priors, cov.model=cov.model,
              n.samples=n.samples, n.omp.threads=1)

summary(m.s)
plot(apply(m.s$p.w.samples, 1, median), w)

m.r <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10,
              tuning=tuning, priors=priors, cov.model=cov.model,
              n.samples=n.samples, n.omp.threads=1)

summary(m.r)

##Fit with some user defined neighbor ordering

##ord <- order(coords[,2]) ##y-axis 
ord <- order(coords[,1]+coords[,2]) ##x+y-axis
##ord <- sample(1:n, n) ##random

m.r.xy <- spNNGP(y~x-1, coords=coords, starting=starting, method="response", n.neighbors=10,
              tuning=tuning, priors=priors, cov.model=cov.model,
              ord=ord, return.neighbor.info=TRUE,
              n.samples=n.samples, n.omp.threads=1)

summary(m.r.xy)

# }
# NOT RUN {
##Visualize the neighbor sets and ordering constraint
n.indx <- m.r.xy$neighbor.info$n.indx
ord <- m.r.xy$neighbor.info$ord

##This is how the data are ordered internally for model fitting
coords.ord <- coords[ord,]

for(i in 1:n){

    plot(coords.ord, cex=1, xlab="Easting", ylab="Northing")
    points(coords.ord[i,,drop=FALSE], col="blue", pch=19, cex=1)
    points(coords.ord[n.indx[[i]],,drop=FALSE], col="red", pch=19, cex=1)

    readline(prompt = "Pause. Press <Enter> to continue...")
}
# }
# NOT RUN {

# }

Run the code above in your browser using DataLab