spConjNNGP: Function for fitting univariate Bayesian conjugate spatial regression models

Description

The function spConjNNGP fits Gaussian univariate Bayesian conjugate spatial regression models using Nearest Neighbor Gaussian Processes (NNGP).

Usage

spConjNNGP(formula, data = parent.frame(), coords, n.neighbors = 15,
               theta.alpha, sigma.sq.IG, cov.model = "exponential",
               k.fold = 5, score.rule = "crps",
               X.0, coords.0, 
               n.omp.threads = 1, search.type = "tree",
               return.neighbors = FALSE, verbose = TRUE, ...)

Arguments

formula

a symbolic description of the regression model to be fit. See example below.

data

an optional data frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which spConjNNGP is called.

coords

an \(n \times 2\) matrix of the observation coordinates in \(R^2\) (e.g., easting and northing).

n.neighbors

number of neighbors used in the NNGP.

theta.alpha

a vector or matrix of parameter values for phi, nu, and alpha, where \(\alpha=\tau^2/\sigma^2\) and nu is only required if cov.model="matern". A vector is passed to run the model using one set of parameters. The vector elements must be named and hold values for phi, nu, and alpha. If a matrix is passed, columns must be named and hold values for phi, alpha, and nu. Each row in the matrix defines a set of parameters for which the model will be run.

sigma.sq.IG

a vector of length two that holds the hyperparameters, shape and scale respectively, for the inverse-Gamma prior on \(\sigma^2\).

cov.model

a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the observations. Supported covariance model key words are: "exponential", "matern", "spherical", and "gaussian". See below for details.

k.fold

specifies the number of k folds for cross-validation. If theta.alpha is a vector then cross-validation is not performed and k-fold and score.rule are ignored. In k-fold cross-validation, the data specified in model is randomly partitioned into k equal sized subsamples. Of the k subsamples, k-1 subsamples are used to fit the model and the remaining k samples are used for prediction. The cross-validation process is repeated k times (the folds). Root mean squared prediction error (RMSPE) and continuous ranked probability score (CRPS; Gneiting and Raftery, 2007) rules are averaged over the k fold prediction results and reported for the parameter sets defined by theta.alpha. The parameter set that yields the best performance based on the scoring rule defined by score.rule is used fit the final model that uses all the data and make predictions if X.0 and coords.0 are specified. Results from the k-fold cross-validation are returned in the k.fold.scores matrix.

score.rule

a quoted keyword "rmspe" or "crps" that specifies the scoring rule used to select the best parameter set, see argument definition for k.fold for more details.

X.0

the design matrix for prediction locations. An intercept should be provided in the first column if one is specified in model.

coords.0

the spatial coordinates corresponding to X.0.

n.omp.threads

a positive integer indicating the number of threads to use for SMP parallel processing. The package must be compiled for OpenMP support. For most Intel-based machines, we recommend setting n.omp.threads up to the number of hyperthreaded cores.

search.type

a quoted keyword that specifies type of nearest neighbor search algorithm. Supported method key words are: "tree" and "brute" both will yield the same solution but "tree" should be much faster.

return.neighbors

if TRUE, a list containing the indices for each location's nearest neighbors will be returned along with ordered data used to fit a NNGP model. This argument should typically be FALSE. See n.indx below for more details.

verbose

if TRUE, model specification and progress is printed to the screen. Otherwise, nothing is printed to the screen.

...

currently no additional arguments.

Value

An object of class cNNGP, which is a list comprising:

theta.alpha

the input theta.alpha vector, or best (according to the selected scoring rule) set of parameters in the theta.alpha matrix. All subsequent parameter estimates are based on this parameter set.

beta.hat

a matrix of regression coefficient estimates corresponding to the returned theta.alpha.

beta.var

beta.hat variance-covariance matrix.

sigma.sq.hat

estimate of \(\sigma^2\) corresponding to the returned theta.alpha.

sigma.sq.var

sigma.sq.hat variance.

k.fold.scores

results from the k-fold cross-validation if theta.alpha is a matrix.

y.0.hat

prediction if X.0 and coords.0 are specified.

y.0.var.hat

y.0.hat variance.

n.neighbors

number of neighbors used in the NNGP.

n.indx

if return.neighbors=TRUE then n.indx will be a list of length \(n\). The i-th element in the list corresponds to the i-th row in coords.ord matrix and the elements are the nearest neighbor indices for the given location.

ord

if return.neighbors=TRUE the vector ord=order(coords[,1]), which is the vector of indices used to order data necessary for fitting the NNGP model, is returned.

coords.ord

if return.neighbors=TRUE then coords.ord = coords[ord,] is returned.

y.ord

if return.neighbors=TRUE then y.ord = y[ord] is returned.

X.ord

if return.neighbors=TRUE then X.ord = X[ord,,drop=FALSE] is returned.

run.time

execution time for building the nearest neighbor index and parameter estimation reported using proc.time().

References

Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111:800-812.

Finley, A.O., A. Datta, B.C. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee (2017) Applying Nearest Neighbor Gaussian Processes to massive spatial data sets: Forest canopy height prediction across Tanana Valley Alaska, https://arxiv.org/abs/1702.00434v2.

Gneiting, T and A.E. Raftery. (2007) Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102:359-378.

Examples

Run this code

# NOT RUN {
rmvn <- function(n, mu=0, V = matrix(1)){
  p <- length(mu)
  if(any(is.na(match(dim(V),p))))
    stop("Dimension problem!")
  D <- chol(V)
  t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p)))
}

##Make some data
set.seed(1)
n <- 2000
coords <- cbind(runif(n,0,1), runif(n,0,1))

x <- cbind(1, rnorm(n))

B <- as.matrix(c(1,5))

sigma.sq <- 5
tau.sq <- 1
phi <- 3/0.5

D <- as.matrix(dist(coords))
R <- exp(-phi*D)
w <- rmvn(1, rep(0,n), sigma.sq*R)
y <- rnorm(n, x%*%B + w, sqrt(tau.sq))

ho <- sample(1:n, 1000)

y.ho <- y[ho]
x.ho <- x[ho,,drop=FALSE]
w.ho <- w[ho]
coords.ho <- coords[ho,]

y <- y[-ho]
x <- x[-ho,,drop=FALSE]
w <- w[-ho,,drop=FALSE]
coords <- coords[-ho,]

##Fit a Conjugate NNGP model and predict for the holdout
sigma.sq.IG <- c(2, sigma.sq)

cov.model <- "exponential"

g <- 10
theta.alpha <- cbind(seq(phi,30,length.out=g), seq(tau.sq/sigma.sq,5,length.out=g))

colnames(theta.alpha) <- c("phi", "alpha")

##one thread
m.c <- spConjNNGP(y~x-1, coords=coords, n.neighbors = 10,
                  X.0 = x.ho, coords.0 = coords.ho,
                  k.fold = 5, score.rule = "crps",
                  n.omp.threads = 1,
                  theta.alpha = theta.alpha, sigma.sq.IG = sigma.sq.IG, cov.model = cov.model)

m.c$beta.hat
m.c$theta.alpha.sigmaSq
m.c$k.fold.scores

##two threads
m.c <- spConjNNGP(y~x-1, coords=coords, n.neighbors = 10,
                  X.0 = x.ho, coords.0 = coords.ho,
                  k.fold = 5, score.rule = "crps",
                  n.omp.threads = 2,
                  theta.alpha = theta.alpha, sigma.sq.IG = sigma.sq.IG, cov.model = cov.model)

m.c$beta.hat
m.c$sigmaSq.hat
m.c$k.fold.scores

# }

Run the code above in your browser using DataLab