sparse.sglmm: Fit a sparse SGLMM.

Description

Fit a sparse SGLMM.

Usage

sparse.sglmm(formula, family = gaussian, data, offset, A, method = c("BSF",
  "RSR"), attractive = 50, repulsive = 0, tol = 0.01, minit = 10000,
  maxit = 1e+06, tune = list(), hyper = list(), model = TRUE,
  x = FALSE, y = FALSE, verbose = FALSE, parallel = FALSE)

Arguments

formula

an object of class formula: a symbolic description of the model to be fitted.

family

a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function, or the result of a call to a family function. (See family for details of family functions.) Supported families are binomial, gaussian (default), negbinomial, and poisson.

data

an optional data frame, list, or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which sparse.sglmm is called.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.

the adjacency matrix for the underlying graph.

method

the basis to use. The options are Bayesian spatial filtering (“BSF”) and restricted spatial regression (“RSR”).

attractive

the number of attractive Moran eigenvectors to use. The default is 50. See `Details' for more information.

repulsive

the number of repulsive Moran eigenvectors to use. The default is 0. See `Details' for more information.

tol

a tolerance. If all Monte Carlo standard errors are smaller than tol, no more samples are drawn from the posterior. The default is 0.01.

minit

the minimum sample size. This should be large enough to permit accurate estimation of Monte Carlo standard errors. The default is 10,000.

maxit

the maximum sample size. Sampling from the posterior terminates when all Monte Carlo standard errors are smaller than tol or when maxit samples have been drawn, whichever happens first. The default is 1,000,000.

tune

(where relevant) a list containing sigma.s, sigma.h, and sigma.theta. These are the standard deviations for the $\gamma$, $\delta$, and $\theta$ proposals, respectively.

hyper

a list containing sigma.b, the prior standard deviation for $\beta$, and (where relevant) a.h and b.h, the parameters of the gamma prior for $\tau_h$.

model

a logical value indicating whether the model frame should be included as a component of the returned value.

a logical value indicating whether the model matrix used in the fitting process should be returned as a component of the returned value.

a logical value indicating whether the response vector used in the fitting process should be returned as a component of the returned value.

verbose

a logical value indicating whether to print MCMC progress to the screen. Defaults to FALSE.

parallel

(for parallelized computation of the Moran operator) a list containing type and nodes, the cluster type and number of slave nodes, respectively. The former must be one of “FORK”, “MPI”, “NWS”, “PSOCK”, or “SOCK” (default). The latter must be a whole number greater than 1. This argument defaults to FALSE, in which case the matrix multiplications are not parallelized.

Value

sparse.sglmm returns an object of class “sparse.sglmm”, which is a list containing the following components.

coefficients

the estimated regression coefficients.

fitted.values

the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.

linear.predictors

the linear fit on link scale.

residuals

the response residuals.

iter

the size of the posterior sample.

beta.sample

an iter by $p$ matrix containing the posterior samples for $\beta$.

gamma.sample

an iter by $q$ matrix containing the posterior samples for $\gamma$.

delta.sample

(where relevant) an iter by $q$ matrix containing the posterior samples for $\delta$.

theta.sample

(where relevant) a vector containing the posterior samples for $\theta$.

tau.s.sample

a vector containing the posterior samples for $\tau_s$.

tau.h.sample

(where relevant) a vector containing the posterior samples for $\tau_h$.

gamma.est

the estimate of $\gamma$.

delta.est

(where relevant) the estimate of $\delta$.

tau.s.est

the estimate of $\tau_s$.

tau.h.est

(where relevant) the estimate of $\tau_h$.

theta.est

(where relevant) the estimate of $\theta$.

beta.mcse

the Monte Carlo standard errors for $\beta$.

gamma.mcse

the Monte Carlo standard errors for $\gamma$.

delta.mcse

(where relevant) the Monte Carlo standard errors for $\delta$.

tau.s.mcse

the Monte Carlo standard error for $\tau_s$.

tau.h.mcse

(where relevant) the Monte Carlo standard error for $\tau_h$.

theta.mcse

(where relevant) the Monte Carlo standard error for $\theta$.

D.bar

the goodness of fit component of the DIC.

the penalty component of the DIC.

dic

the deviance information criterion.

beta.accept

the acceptance rate for $\beta$.

gamma.accept

the acceptance rate for $\gamma$.

delta.accept

(where relevant) the acceptance rate for $\delta$.

theta.accept

(where relevant) the acceptance rate for $\theta$.

if requested (the default), the y vector used.

if requested, the model matrix.

if requested, the matrix of Moran eigenvectors.

eigen.values

if requested, the spectrum of the Moran operator.

hyper

a list containing the names and values of the hyperparameters.

tune

a list containing the names and values of the tuning parameters.

model

if requested (the default), the model frame.

call

the matched call.

formula

the formula supplied.

terms

the terms object used.

data

the data argument.

offset

the offset vector used.

xlevels

(where relevant) a record of the levels of the factors used in fitting.

Details

This function fits the sparse restricted spatial regression model of Hughes and Haran (2013), or the Bayesian spatial filtering model of Hughes (2017). The first stage of the model is $$g(\mu_i)=x_i^\prime\beta+m_i^\prime\gamma\hspace{1 cm}(i=1,\dots,n)$$ or, in vectorized form, $$g(\mu)=X\beta+M\gamma,$$ where $X$ is the design matrix, $\beta$ is a $p$-vector of regression coefficients, the columns of $M$ are $q$ eigenvectors of the Moran operator, and $\gamma$ are spatial random effects. Arguments attractive and repulsive can be used to control the number of eigenvectors used. The default values are 50 and 0, respectively, which corresponds to pure spatial smoothing. Inclusion of some repulsive eigenvectors can be advantageous in certain applications. The second stage, i.e., the prior for $\gamma$, is $$p(\gamma\mid\tau_s)\propto\tau_s^{q/2}\exp\left(-\frac{\tau_s}{2}\gamma^\prime M^\prime QM\gamma\right),$$ where $\tau_s$ is a smoothing parameter and $Q$ is the graph Laplacian. The prior for $\beta$ is spherical $p$-variate normal with mean zero and common standard deviation sigma.b, which defaults to 1,000. The prior for $\tau_s$ is gamma with parameters 0.5 and 2,000. The same prior is used for $\theta$ (when family is negbinomial). When the response is normally distributed, the identity link is assumed, in which case the first stage is $$\mu=X\beta+M\gamma+M\delta,$$ where $\delta$ are heterogeneity random effects. When the response is Poisson distributed, heterogeneity random effects are optional. In any case, the prior on $\delta$ is spherical $q$-variate normal with mean zero and common variance $1/\tau_h$. The prior for $\tau_h$ is gamma with parameters $a_h$ and $b_h$, the values of which are controlled by the user through argument hyper. If the response is Bernoulli, negative binomial, or Poisson, $\beta$ and $\gamma$ are updated using Metropolis-Hastings random walks with normal proposals. The proposal covariance matrix for $\beta$ is the estimated asymptotic covariance matrix from a glm fit to the data (see vcov). The proposal for $\gamma$ is spherical normal with common standard deviation sigma.s. The updates for $\tau_s$ and $\tau_h$ are Gibbs updates irrespective of the response distribution. If the response is Poisson distributed and heterogeneity random effects are included, those random effects are updated using a Metropolis-Hastings random walk with a spherical normal proposal. The common standard deviation is sigma.h. If the response is normally distributed, all updates are Gibbs updates. If the response is negative binomial, the dispersion parameter $\theta$ is updated using a Metropolis-Hastings random walk with a normal proposal. Said proposal has standard deviation sigma.theta, which can be provided by the user as an element of argument tune.

References

Hughes, J. and Haran, M. (2013) Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. Journal of the Royal Statistical Society, Series B, 75(1), 139--159.

Examples

Run this code

# NOT RUN {
The following code duplicates the analysis described in (Hughes and Haran, 2013). The data are
infant mortality data for 3,071 US counties. We do a spatial Poisson regression with offset.

data(infant)
infant$low_weight = infant$low_weight / infant$births
attach(infant)
Z = deaths
X = cbind(1, low_weight, black, hispanic, gini, affluence, stability)
data(A)
set.seed(123456)
fit = sparse.sglmm(Z ~ X - 1 + offset(log(births)), family = poisson, A = A, method = "RSR",
                   tune = list(sigma.s = 0.02), verbose = TRUE)
summary(fit)
# }

Run the code above in your browser using DataLab