spGLM: Function for fitting univariate Bayesian generalized linear spatial regression models

Description

The function spGLM fits univariate Bayesian generalized linear spatial regression models. Given a set of knots, spGLM will also fit a predictive process model (see references below).

Usage

spGLM(formula, family="binomial", weights, data = parent.frame(),
      coords, knots, starting, tuning, priors, cov.model,
      amcmc, n.samples, verbose=TRUE,
      n.report=100, ...)

Arguments

formula

a symbolic description of the regression model to be fit. See example below.

family

currently only supports binomial and poisson data using the logit and log link functions, respectively.

weights

an optional vector of weights to be used in the fitting process. Weights correspond to number of trials and offset for each location for the binomial and poisson family, respectively.

data

an optional data frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which spGLM is called.

coords

an $n \times 2$ matrix of the observation coordinates in $R^2$ (e.g., easting and northing).

knots

either a $m \times 2$ matrix of the predictive process knot coordinates in $R^2$ (e.g., easting and northing) or a vector of length two or three with the first and second elements recording the number of columns and rows in the desired

starting

a list with each tag corresponding to a parameter name. Valid tags are beta, sigma.sq, phi, nu, and w. The value portion of each tag is the parameter's starting value. If the predic

tuning

a list with each tag corresponding to a parameter name. Valid tags are beta, sigma.sq, phi, nu, and w. The value portion of each tag defines the variance of the Metropolis sample

priors

a list with each tag corresponding to a parameter name. Valid tags are sigma.sq.ig, phi.unif, nu.unif, beta.norm, and beta.flat. Variance parameter simga.sq is assu

cov.model

a quoted keyword that specifies the covariance function used to model the spatial dependence structure among the observations. Supported covariance model key words are: "exponential", "matern", "spherical"

amcmc

a list with tags n.batch, batch.length, and accept.rate. Specifying this argument invokes an adaptive MCMC sampler, see Roberts and Rosenthal (2007) for an explanation.

n.samples

the number of MCMC iterations. This argument is ignored if amcmc is specified.

verbose

if TRUE, model specification and progress of the sampler is printed to the screen. Otherwise, nothing is printed to the screen.

n.report

the interval to report Metropolis sampler acceptance and MCMC progress.

...

currently no additional arguments.

Value

An object of class spGLM, which is a list with the following tags:
coordsthe $n \times 2$ matrix specified by coords.
knot.coordsthe $m \times 2$ matrix as specified by knots.
p.beta.theta.samplesa coda object of posterior samples for the defined parameters.
acceptancethe Metropolis sampler acceptance rate. If amcmc is used then this will be a matrix of each parameter's acceptance rate at the end of each batch. Otherwise, the sampler is a Metropolis with a joint proposal of all parameters.
acceptance.wif this is a non-predictive process model and amcmc is used then this will be a matrix of the Metropolis sampler acceptance rate for each location's spatial random effect.
acceptance.w.knotsif this is a predictive process model and amcmc is used then this will be a matrix of the Metropolis sampler acceptance rate for each knot's spatial random effect.
p.w.knots.samplesa matrix that holds samples from the posterior distribution of the knots' spatial random effects. The rows of this matrix correspond to the $m$ knot locations and the columns are the posterior samples. This is only returned if a predictive process model is used.
p.w.samplesa matrix that holds samples from the posterior distribution of the locations' spatial random effects. The rows of this matrix correspond to the $n$ point observations and the columns are the posterior samples.
The return object might include additional data used for subsequent prediction and/or model fit evaluation.

Details

If a binomial model is specified the response vector is the number of successful trials at each location and weights is the total number of trials at each location.

For a poisson specification, the weights vector is the count offset, e.g., population, at each location. This differs from the glm offset argument which is passed as the log of this value. A non-spatial model is fit when coords is not specified. See example below.

References

Finley, A.O., S. Banerjee, and R.E. McRoberts. (2008) A Bayesian approach to quantifying uncertainty in multi-source forest area estimates. Environmental and Ecological Statistics, 15:241--258. Banerjee, S., A.E. Gelfand, A.O. Finley, and H. Sang. (2008) Gaussian Predictive Process Models for Large Spatial Datasets. Journal of the Royal Statistical Society Series B, 70:825--848.

Banerjee, S., Carlin, B.P., and Gelfand, A.E. (2004) Hierarchical modeling and analysis for spatial data. Chapman and Hall/CRC Press, Boca Raton, Fla. Roberts G.O. and Rosenthal J.S. (2006) Examples of Adaptive MCMC. http://probability.ca/jeff/ftpdir/adaptex.pdf Preprint.

Examples

Run this code

library(MBA)

set.seed(1)

rmvn <- function(n, mu=0, V = matrix(1)){
  p <- length(mu)
  if(any(is.na(match(dim(V),p))))
    stop("Dimension problem!")
  D <- chol(V)
  t(matrix(rnorm(n*p), ncol=p) %*% D + rep(mu,rep(n,p)))
}

################################
##Spatial binomial
################################

##Generate binary data
coords <- as.matrix(expand.grid(seq(0,100,length.out=8), seq(0,100,length.out=8)))
n <- nrow(coords)

phi <- 3/50
sigma.sq <- 2

R <- sigma.sq*exp(-phi*as.matrix(dist(coords)))
w <- rmvn(1, rep(0,n), R)

x <- as.matrix(rep(1,n))
beta <- 0.1
p <- 1/(1+exp(-(x%*%beta+w)))

weights <- rep(1, n)
weights[coords[,1]>mean(coords[,1])] <- 10

y <- rbinom(n, size=weights, prob=p)

##Collect samples
fit <- glm((y/weights)~x-1, weights=weights, family="binomial")
beta.starting <- coefficients(fit)
beta.tuning <- t(chol(vcov(fit)))

n.batch <- 200
batch.length <- 50
n.samples <- n.batch*batch.length

m.1 <- spGLM(y~1, family="binomial", coords=coords, weights=weights, 
             starting=list("beta"=beta.starting, "phi"=0.06,"sigma.sq"=1, "w"=0),
             tuning=list("beta"=beta.tuning, "phi"=0.5, "sigma.sq"=0.5, "w"=0.5),
             priors=list("beta.Normal"=list(0,10), "phi.Unif"=c(0.03, 0.3), "sigma.sq.IG"=c(2, 1)),
             amcmc=list("n.batch"=n.batch, "batch.length"=batch.length, "accept.rate"=0.43),
             cov.model="exponential", verbose=TRUE, n.report=10)

burn.in <- 0.9*n.samples
sub.samps <- burn.in:n.samples

print(summary(window(m.1$p.beta.theta.samples, start=burn.in)))

beta.hat <- m.1$p.beta.theta.samples[sub.samps,"(Intercept)"]
w.hat <- m.1$p.w.samples[,sub.samps]

p.hat <- 1/(1+exp(-(x%*%beta.hat+w.hat)))

y.hat <- apply(p.hat, 2, function(x){rbinom(n, size=weights, prob=p)})

y.hat.mu <- apply(y.hat, 1, mean)
y.hat.var <- apply(y.hat, 1, var)

##Take a look
par(mfrow=c(1,2))
surf <- mba.surf(cbind(coords,y.hat.mu),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image(surf, main="Interpolated mean of posterior rate\n(observed rate)")
contour(surf, add=TRUE)
text(coords, label=paste("(",y,")",sep=""))

surf <- mba.surf(cbind(coords,y.hat.var),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image(surf, main="Interpolated variance of posterior rate\n(observed #
of trials)")
contour(surf, add=TRUE)
text(coords, label=paste("(",weights,")",sep=""))

###########################
##Spatial poisson
###########################
##Generate count data
set.seed(1)

n <- 100

coords <- cbind(runif(n,1,100),runif(n,1,100))

phi <- 3/50
sigma.sq <- 2

R <- sigma.sq*exp(-phi*as.matrix(dist(coords)))
w <- rmvn(1, rep(0,n), R)

x <- as.matrix(rep(1,n))
beta <- 0.1
y <- rpois(n, exp(x%*%beta+w))

##Collect samples
beta.starting <- coefficients(glm(y~x-1, family="poisson"))
beta.tuning <- t(chol(vcov(glm(y~x-1, family="poisson"))))

n.batch <- 500
batch.length <- 50
n.samples <- n.batch*batch.length

##Note tuning list is now optional

m.1 <- spGLM(y~1, family="poisson", coords=coords,
             starting=list("beta"=beta.starting, "phi"=0.06,"sigma.sq"=1, "w"=0),
             tuning=list("beta"=0.1, "phi"=0.5, "sigma.sq"=0.5, "w"=0.5),
             priors=list("beta.Flat", "phi.Unif"=c(0.03, 0.3), "sigma.sq.IG"=c(2, 1)),
             amcmc=list("n.batch"=n.batch, "batch.length"=batch.length, "accept.rate"=0.43),
             cov.model="exponential", verbose=TRUE, n.report=10)

##Just for fun check out the progression of the acceptance
##as it moves to 43% (same can be seen for the random spatial effects).
plot(mcmc(t(m.1$acceptance)), density=FALSE, smooth=FALSE)

##Now parameter summaries, etc.
burn.in <- 0.9*n.samples
sub.samps <- burn.in:n.samples

m.1$p.samples[,"phi"] <- 3/m.1$p.samples[,"phi"]

plot(m.1$p.beta.theta.samples)
print(summary(window(m.1$p.beta.theta.samples, start=burn.in)))

beta.hat <- m.1$p.beta.theta.samples[sub.samps,"(Intercept)"]
w.hat <- m.1$p.w.samples[,sub.samps]

y.hat <- apply(exp(x%*%beta.hat+w.hat), 2, function(x){rpois(n, x)})

y.hat.mu <- apply(y.hat, 1, mean)

##Take a look
par(mfrow=c(1,2))
surf <- mba.surf(cbind(coords,y),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image(surf, main="Observed counts")
contour(surf, add=TRUE)
text(coords, labels=y, cex=1)

surf <- mba.surf(cbind(coords,y.hat.mu),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image(surf, main="Fitted counts")
contour(surf, add=TRUE)
text(coords, labels=round(y.hat.mu,0), cex=1)

Run the code above in your browser using DataLab