The function spConjNNGP
fits Gaussian univariate Bayesian conjugate spatial
regression models using Nearest Neighbor Gaussian Processes (NNGP).
spConjNNGP(formula, data = parent.frame(), coords, knots, n.neighbors = 15,
theta.alpha, sigma.sq.IG, cov.model = "exponential",
k.fold = 5, score.rule = "crps",
X.0, coords.0, n.omp.threads = 1, search.type = "cb",
ord, return.neighbor.info = TRUE,
neighbor.info, fit.rep = FALSE, n.samples, verbose = TRUE, ...)
An object of class NNGP
and conjugate
, and, if knots
is provided, SLGP
. Among other elements, the object contains:
the input theta.alpha
vector, or best (according to the selected scoring
rule) set of parameters in the theta.alpha
matrix. All
subsequent parameter estimates are based on this parameter set.
a matrix of regression coefficient estimates
corresponding to the returned theta.alpha
.
beta.hat
variance-covariance matrix.
estimate of \(\sigma^2\) corresponding to the returned theta.alpha
.
sigma.sq.hat
variance.
results from the k-fold cross-validation if
theta.alpha
is a matrix.
prediction if X.0
and coords.0
are
specified.
y.0.hat
variance.
number of neighbors used in the NNGP.
returned if return.neighbor.info=TRUE
see the return.neighbor.info
argument
description above.
execution time for parameter estimation reported using
proc.time()
. This time does not include nearest neighbor
search time for building the neighbor set.
a symbolic description of the regression model to be fit. See example below.
an optional data frame containing the variables in the
model. If not found in data, the variables are taken from
environment(formula)
, typically the environment from which
spConjNNGP
is called.
an \(n \times 2\) matrix of the observation
coordinates in \(R^2\) (e.g., easting and northing), or if
data
is a data frame then coords
can be a vector of
length two comprising coordinate column names or indices. There can
be no duplicate locations.
an \(r \times 2\) matrix of the observation coordinates
in \(R^2\) (e.g., easting and northing). Adding the
knots
argument invokes SLGP, see Shin et al. (2019) below.
number of neighbors used in the NNGP.
a vector or matrix of parameter values for
phi
, nu
, and alpha
, where
\(\alpha=\tau^2/\sigma^2\) and nu
is
only required if cov.model="matern"
. A vector is passed to
run the model using one set of parameters. The vector elements must
be named and hold values for phi
, nu
, and
alpha
. If a matrix is passed, columns must be named and hold
values for phi
, nu
, and alpha
. Each row in the matrix defines a set of parameters for which the model will be run.
a vector of length two that holds the hyperparameters, shape and scale respectively, for the inverse-Gamma prior on \(\sigma^2\).
a quoted keyword that specifies the covariance
function used to model the spatial dependence structure among the
observations. Supported covariance model key words are:
"exponential"
, "matern"
, "spherical"
, and
"gaussian"
. See below for details.
specifies the number of
k folds for cross-validation. If theta.alpha
is a
vector then cross-validation is not performed and
k-fold
and score.rule
are ignored. In k-fold
cross-validation, the data specified in model
is randomly
partitioned into k equal sized subsamples. Of the k
subsamples, k-1 subsamples are used to fit the model and the
remaining k samples are used for prediction. The
cross-validation process is repeated k times (the folds). Root
mean squared prediction error (RMSPE) and continuous ranked
probability score (CRPS; Gneiting and Raftery, 2007) rules are
averaged over the k fold prediction results and reported for
the parameter sets defined by theta.alpha
. The parameter set
that yields the best performance based on the scoring rule
defined by score.rule
is used to fit the final model that uses all
the data and make predictions if X.0
and coords.0
are
supplied. Results from the k-fold cross-validation are returned in
the k.fold.scores
matrix.
a quoted keyword "rmspe"
or "crps"
that specifies the scoring rule used to select the best
parameter set, see argument definition for k.fold
for more
details.
the design matrix for prediction locations. An
intercept should be provided in the first column if one is specified
in model
.
the spatial coordinates corresponding to
X.0
.
a positive integer indicating
the number of threads to use for SMP parallel processing. The package must
be compiled for OpenMP support. For most Intel-based machines, we recommend setting
n.omp.threads
up to the number of hyperthreaded cores. Note, n.omp.threads
> 1 might not
work on some systems.
a quoted keyword that specifies type of nearest
neighbor search algorithm. Supported method key words are: "cb"
and
"brute"
. The "cb"
should generally be much
faster. If locations do not have identical coordinate values on
the axis used for the nearest neighbor ordering (see
ord
argument) then "cb"
and "brute"
should
produce identical neighbor sets. However, if there are identical
coordinate values on the axis used for nearest neighbor ordering,
then "cb"
and "brute"
might produce different, but
equally valid, neighbor sets, e.g., if data are on a grid.
an index vector of length \(n\) used for the nearest neighbor
search. Internally, this vector is used to order coords
,
i.e., coords[ord,]
, and associated data. Nearest neighbor
candidates for the i-th row in the ordered coords
are
rows 1:(i-1)
, with the n.neighbors
nearest neighbors being
those with the minimum euclidean distance to the location defined by
ordered coords[i,]
. The default used when ord
is not
specified is x-axis ordering, i.e.,
order(coords[,1])
. This argument should typically be
left blank. This argument will be ignored if the neighbor.info
argument is used.
if TRUE
, a list called
neighbor.info
containing several data structures used for
fitting the NNGP model is returned. If there is no change in input
data or n.neighbors
, this list can be passed to subsequent
spNNGP
calls via the neighbor.info
argument to avoid
the neighbor search, which can be time consuming if \(n\) is
large. In addition to the several cryptic data structures in
neighbor.info
there is a list called n.indx
that is of
length \(n\). The i-th element in n.indx
corresponds to the i-th row in coords[ord,]
and holds
the vector of that location's nearest neighbor indices. This list can be useful for plotting the
neighbor graph if desired.
see the return.neighbor.info
argument
description above.
if TRUE
, regression fitted and replicate data will be
returned. The argument n.samples
controls the number of
fitted and replicated data samples.
gives the number of posterior samples
returned. Note, point and associated variance estimates for model
parameters are not based on posterior samples. Only specify
n.samples
if you wish to generate samples from parameters'
posteriors (this is an exact sampling algorithm). If
fit.rep
is TRUE
, then n.samples
also controls
the number of fitted and replicated data samples.
if TRUE
, model specification and progress is printed to the screen. Otherwise, nothing is printed to
the screen.
currently no additional arguments.
Andrew O. Finley finleya@msu.edu,
Abhirup Datta abhidatta@jhu.edu,
Sudipto Banerjee sudipto@ucla.edu
Datta, A., S. Banerjee, A.O. Finley, and A.E. Gelfand. (2016) Hierarchical Nearest-Neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, tools:::Rd_expr_doi("10.1080/01621459.2015.1044091").
Finley, A.O., A. Datta, S. Banerjee (2022) spNNGP R Package for Nearest Neighbor Gaussian Process Models. Journal of Statistical Software, tools:::Rd_expr_doi("10.18637/jss.v103.i05").
Finley, A.O., A. Datta, B.D. Cook, D.C. Morton, H.E. Andersen, and S. Banerjee. (2019) Efficient algorithms for Bayesian Nearest Neighbor Gaussian Processes. Journal of Computational and Graphical Statistics, tools:::Rd_expr_doi("10.1080/10618600.2018.1537924").
Gneiting, T and A.E. Raftery. (2007) Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, tools:::Rd_expr_doi("10.1198/016214506000001437").
Shirota, S., A.O. Finley, B.D. Cook, and S. Banerjee (2019) Conjugate Nearest Neighbor Gaussian Process models for efficient statistical interpolation of large spatial data. https://arxiv.org/abs/1907.10109.
rmvn <- function(n, mu=0, V = matrix(1)){
p <- length(mu)
if(any(is.na(match(dim(V),p))))
stop("Dimension problem!")
D <- chol(V)
t(matrix(rnorm(n*p), ncol=p)%*%D + rep(mu,rep(n,p)))
}
##Make some data
set.seed(1)
n <- 2000
coords <- cbind(runif(n,0,1), runif(n,0,1))
x <- cbind(1, rnorm(n))
B <- as.matrix(c(1,5))
sigma.sq <- 5
tau.sq <- 1
phi <- 3/0.5
D <- as.matrix(dist(coords))
R <- exp(-phi*D)
w <- rmvn(1, rep(0,n), sigma.sq*R)
y <- rnorm(n, x%*%B + w, sqrt(tau.sq))
ho <- sample(1:n, 1000)
y.ho <- y[ho]
x.ho <- x[ho,,drop=FALSE]
w.ho <- w[ho]
coords.ho <- coords[ho,]
y <- y[-ho]
x <- x[-ho,,drop=FALSE]
w <- w[-ho,,drop=FALSE]
coords <- coords[-ho,]
##Fit a Conjugate NNGP model and predict for the holdout
sigma.sq.IG <- c(2, sigma.sq)
cov.model <- "exponential"
g <- 10
theta.alpha <- cbind(seq(phi,30,length.out=g), seq(tau.sq/sigma.sq,5,length.out=g))
colnames(theta.alpha) <- c("phi", "alpha")
m.c <- spConjNNGP(y~x-1, coords=coords, n.neighbors = 10,
X.0 = x.ho, coords.0 = coords.ho,
k.fold = 5, score.rule = "crps",
n.omp.threads = 1,
theta.alpha = theta.alpha, sigma.sq.IG = sigma.sq.IG, cov.model = cov.model)
m.c
Run the code above in your browser using DataLab