spMvGLM: Function for fitting multivariate Bayesian generalized linear spatial regression models

Description

The function spMvGLM fits multivariate Bayesian generalized linear spatial regression models. Given a set of knots, spMvGLM will also fit a predictive process model (see references below).

Usage

spMvGLM(formula, family="binomial", weights, data = parent.frame(), coords, knots,
      amcmc, starting, tuning, priors, cov.model, n.samples, sub.samples,
      verbose=TRUE, n.report=100, ...)

Arguments

formula

for a multivariate model with $q$ response variables, this is a list of univariate formulas. See example below.

family

currently only supports binomial and poisson data using the logit and log link functions, respectively.

weights

an optional $n \times q$ matrix of weights to be used in the fitting process when family is binomial. Here the order of the columns correspond to the univariate models in the formula list. Unless otherwise specified a m

data

an optional data frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which spMvGLM is called.

coords

an $n \times 2$ matrix of the observation coordinates in $R^2$ (e.g., easting and northing).

knots

either a $m \times 2$ matrix of the predictive process knot coordinates in $R^2$ (e.g., easting and northing) or a vector of length two or three with the first and second elements recording the number of columns and rows in the desired

amcmc

a list with tags n.batch, batch.length, and accept.rate.

starting

a list with each tag corresponding to a parameter name. Valid list tags are beta, A, phi, and nu. The value portion of each tag is a vector of parameter's starting value. For A the

tuning

a list with each tag corresponding to a parameter name. Valid list tags are beta, A, phi, nu, and w. The value portion of each tag defines the variance of the Metropolis normal propo

priors

a list with each tag corresponding to a parameter name. Valid list tags are beta.flat, beta.normal, K.IW, Psi.IW, phi.unif, and nu.unif. If beta.normal

cov.model

a quoted key word that specifies the covariance function used to model the spatial dependence structure among the observations. Supported covariance model key words are: "exponential", "matern", "spherical"

n.samples

the number of MCMC iterations.

sub.samples

a vector of length 3 that specifies start, end, and thin, respectively, of the MCMC samples. The default is c(1, n.samples, 1) (i.e., all samples).

verbose

if TRUE, model specification and progress of the sampler is printed to the screen. Otherwise, nothing is printed to the screen.

n.report

the interval to report Metropolis acceptance and MCMC progress.

...

currently no additional arguments.

Value

An object of class spMvGLM, which is a list with the following tags:
coordsthe $n \times 2$ matrix specified by coords.
knot.coordsthe $m \times 2$ matrix as specified by knots.
p.samplesa coda object of posterior samples for the defined parameters.
acceptancethe Metropolis sampling acceptance rate. If amcmc is used then this will be a matrix of each parameter's acceptance rate at the end of each batch. Otherwise, the sampler is a Metropolis with a joint proposal of all parameters, and the acceptance rate is the average over all proposals.
acceptance.wIf this is a non-predictive process model and amcmc is used then this will be a matrix of each random spatial effects acceptance rate at the end of each batch.
acceptance.w.strIf this is a predictive process model and amcmc is used then this will be a matrix of each random spatial effects acceptance rate at the end of each batch.
sp.effectsa matrix that holds samples from the posterior distribution of the spatial random effects. The rows of this matrix correspond to the $n$ point observations and the columns are the posterior samples.
The return object might include additional data used for subsequent prediction and/or model fit evaluation.

Details

If a binomial model is specified the response vector is the number of successful trials at each location. Note, this is different from the glm argument which is passed as a rate.

References

Finley, A.O., S. Banerjee, and R.E. McRoberts. (2008) A Bayesian approach to quantifying uncertainty in multi-source forest area estimates. Environmental and Ecological Statistics, 15:241--258.

Banerjee, S., A.E. Gelfand, A.O. Finley, and H. Sang. (2008) Gaussian Predictive Process Models for Large Spatial Datasets. Journal of the Royal Statistical Society Series B, 70:825--848.

Finley, A.O., S. Banerjee, P. Waldmann, and T. Ericsson. (2008). Hierarchical spatial modeling of additive and dominance genetic variance for large spatial trial datasets. Biometrics. DOI: 10.1111/j.1541-0420.2008.01115.x

Finley, A.O,. H. Sang, S. Banerjee, and A.E. Gelfand. (2008). Improving the performance of predictive process modeling for large datasets. Computational Statistics and Data Analysis, DOI: 10.1016/j.csda.2008.09.008 Banerjee, S., Carlin, B.P., and Gelfand, A.E. (2004). Hierarchical modeling and analysis for spatial data. Chapman and Hall/CRC Press, Boca Raton, Fla. Roberts G.O. and Rosenthal J.S. (2006) Examples of Adaptive MCMC. http://probability.ca/jeff/ftpdir/adaptex.pdf Preprint.

Examples

Run this code

library(MASS)
library(MBA)
library(fields)

################################
##Spatial multivariate binomial
################################

##Generate some data
n <- 50
q <- 3   
nltr <- q*(q-1)/2+q

coords <- cbind(runif(n,1,100),runif(n,1,100))

theta <- rep(3/50,q)

A <- matrix(0,q,q)
A[lower.tri(A,TRUE)] <- rnorm(nltr,1,1)
K <- A%*%t(A)

Psi <- diag(0,q)

c1 <- mvCovInvLogDet(coords=coords, knots=coords,
                     cov.model="exponential",
                     V=K, Psi=Psi, theta=theta,
                     modified.pp=TRUE, SWM=FALSE)

w <- mvrnorm(1,rep(0,nrow(c1$C)),c1$C)

X <- mkMvX(list(matrix(1,n,1), matrix(1,n,1), matrix(1,n,1)))
beta <- c(-1,0,1)

weight <- 10 ##i.e., trials 
p <- 1/(1+exp(-(X%*%beta+w)))
y <- rbinom(n*q, rep(weight,n*q), prob=p)

y.1 <- y[seq(1,length(y),q)]
y.2 <- y[seq(2,length(y),q)]
y.3 <- y[seq(3,length(y),q)]

##Specify starting values and collect samples
A.starting <- diag(1,q)[lower.tri(diag(1,q), TRUE)]

fit <- glm((y/weight)~X-1, weights=rep(weight, n*q), family="binomial")
beta.starting <- coefficients(fit)
beta.tuning <- t(chol(vcov(fit)))

n.batch <- 200
batch.length <- 50
n.samples <- n.batch*batch.length

m.1 <- spMvGLM(list(y.1~1,y.2~1,y.3~1), weights=matrix(weight,n,q),
               family="binomial", coords=coords, 
               starting=
               list("beta"=beta.starting, "phi"=rep(0.06,q),"A"=A.starting, "w"=0), 
               tuning=
               list("beta"=beta.tuning, "phi"=rep(0.05,q),"A"=rep(0.01,nltr), "w"=0.001),
               priors=
               list("beta.Flat", "phi.Unif"=rep(c(0.03, 0.3),q),"K.IW"=list(q+1, diag(0.1,q))),
               amcmc=list("n.batch"=n.batch,"batch.length"=batch.length,"accept.rate"=0.43),
               cov.model="exponential", sub.sample=c(5000,n.samples,10),
               verbose=TRUE, n.report=100)

print(summary(mcmc(m.1$p.samples)))

beta.hat <- colMeans(m.1$p.samples[,1:q])
w.hat <- rowMeans(m.1$sp.effects)

p.hat <- 1/(1+exp(-(X%*%beta.hat+w.hat)))

p.hat.1 <- p.hat[seq(1,length(p.hat),q)]
p.hat.2 <- p.hat[seq(2,length(p.hat),q)]
p.hat.3 <- p.hat[seq(3,length(p.hat),q)]

##Take a look
par(mfrow=c(3,2))
surf <- mba.surf(cbind(coords,y.1/weight),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image.plot(surf, main="Observed rates")

surf <- mba.surf(cbind(coords,p.hat.1),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image.plot(surf, main="Fitted rates")

surf <- mba.surf(cbind(coords,y.2/weight),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image.plot(surf)

surf <- mba.surf(cbind(coords,p.hat.2),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image.plot(surf)

surf <- mba.surf(cbind(coords,y.3/weight),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image.plot(surf)

surf <- mba.surf(cbind(coords,p.hat.3),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image.plot(surf)

##Prediction
m <- 200
pred.coords <- cbind(runif(m,1,100),runif(m,1,100))
pred.covars <- mkMvX(list(matrix(1,m,1), matrix(1,m,1), matrix(1,m,1)))

pred <- spPredict(m.1, pred.coords, pred.covars)

pred.p <- rowMeans(pred$y.pred)
pred.p.1 <- pred.p[seq(1,length(pred.p),q)]
pred.p.2 <- pred.p[seq(2,length(pred.p),q)]
pred.p.3 <- pred.p[seq(3,length(pred.p),q)]

##Take a look
par(mfrow=c(3,2))
surf <- mba.surf(cbind(coords,y.1/weight),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image.plot(surf, main="Observed rates")

surf <- mba.surf(cbind(pred.coords,pred.p.1),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image.plot(surf, main="Fitted rates")

surf <- mba.surf(cbind(coords,y.2/weight),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image.plot(surf)

surf <- mba.surf(cbind(pred.coords,pred.p.2),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image.plot(surf)

surf <- mba.surf(cbind(coords,y.3/weight),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image.plot(surf)

surf <- mba.surf(cbind(pred.coords,pred.p.3),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image.plot(surf)

################################
##Spatial multivariate poisson
################################
##Generate some data
n <- 100
q <- 3   
nltr <- q*(q-1)/2+q

coords <- cbind(runif(n,1,100),runif(n,1,100))

theta <- rep(3/50,q)

A <- matrix(0,q,q)
A[lower.tri(A,TRUE)] <- rnorm(nltr,1,1)
K <- A%*%t(A)

Psi <- diag(0,q)

c1 <- mvCovInvLogDet(coords=coords, knots=coords,
                     cov.model="exponential",
                     V=K, Psi=Psi, theta=theta,
                     modified.pp=TRUE, SWM=FALSE)

w <- mvrnorm(1,rep(0,nrow(c1$C)),c1$C)

X <- mkMvX(list(matrix(1,n,1), matrix(1,n,1), matrix(1,n,1)))
beta <- c(-1,0,1)
y <- rpois(n*q, exp(X%*%beta+w))

y.1 <- y[seq(1,length(y),q)]
y.2 <- y[seq(2,length(y),q)]
y.3 <- y[seq(3,length(y),q)]

##Specify starting values and collect samples. For
##a true analysis, several longer chains should be
##run.
A.starting <- diag(1,q)[lower.tri(diag(1,q), TRUE)]

beta.starting <- coefficients(glm(y~X-1, family="poisson"))
beta.tuning <- t(chol(vcov(glm(y~X-1, family="poisson"))))

n.samples <- 5000

m.1 <- spMvGLM(list(y.1~1,y.2~1,y.3~1), family="poisson", coords=coords,
               starting=
               list("beta"=beta.starting, "phi"=rep(0.06,q),
                    "A"=A.starting, "w"=0), 
               tuning=
               list("beta"=beta.tuning, "phi"=rep(0.01,q),
                    "A"=rep(0.005,nltr), "w"=0.001),
               priors=
               list("beta.Flat", "phi.Unif"=rep(c(0.03, 0.3),q),
                    "K.IW"=list(q+1, diag(0.1,q))),
               cov.model="exponential",
               n.samples=n.samples, sub.sample=c(5000,n.samples,10),
               verbose=TRUE, n.report=500)

m.1$p.samples[,paste("phi_",1:q,sep="")] <-
  3/m.1$p.samples[,paste("phi_",1:q,sep="")]

print(summary(mcmc(m.1$p.samples)))

beta.hat <- colMeans(m.1$p.samples[,1:q])
w.hat <- rowMeans(m.1$sp.effects)

y.hat <- exp(X%*%beta.hat+w.hat)

y.hat.1 <- y.hat[seq(1,length(y.hat),q)]
y.hat.2 <- y.hat[seq(2,length(y.hat),q)]
y.hat.3 <- y.hat[seq(3,length(y.hat),q)]

##Take a look
par(mfrow=c(3,2))
surf <- mba.surf(cbind(coords,y.1),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image(surf, main="Observed counts")
contour(surf, drawlabels=FALSE, add=TRUE)
text(coords, labels=y.1, cex=1, col="blue")

surf <- mba.surf(cbind(coords,y.hat.1),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image(surf, main="Fitted counts")
contour(surf, add=TRUE)
contour(surf, drawlabels=FALSE, add=TRUE)
text(coords, labels=round(y.hat.1,0), cex=1, col="blue")

surf <- mba.surf(cbind(coords,y.2),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image(surf)
contour(surf, drawlabels=FALSE, add=TRUE)
text(coords, labels=y.2, cex=1, col="blue")

surf <- mba.surf(cbind(coords,y.hat.2),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image(surf)
contour(surf, drawlabels=FALSE, add=TRUE)
text(coords, labels=round(y.hat.2,0), cex=1, col="blue")

surf <- mba.surf(cbind(coords,y.3),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image(surf)
contour(surf, drawlabels=FALSE, add=TRUE)
text(coords, labels=y.3, cex=1, col="blue")

surf <- mba.surf(cbind(coords,y.hat.3),no.X=100, no.Y=100, extend=TRUE)$xyz.est
image(surf)
contour(surf, drawlabels=FALSE, add=TRUE)
text(coords, labels=round(y.hat.3,0), cex=1, col="blue")

Run the code above in your browser using DataLab