Fit a sparse SGLMM.
sparse.sglmm(formula, family = gaussian, data, offset, A, method = c("BSF",
"RSR"), attractive = 50, repulsive = 0, tol = 0.01, minit = 10000,
maxit = 1e+06, tune = list(), hyper = list(), model = TRUE,
x = FALSE, y = FALSE, verbose = FALSE, parallel = FALSE)
an object of class formula
: a symbolic description of the model to be fitted.
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function, or the result of a call to a family function. (See family
for details of family functions.) Supported families are binomial
, gaussian
(default), negbinomial
, and poisson
.
an optional data frame, list, or environment (or object coercible by as.data.frame
to a data frame) containing the variables in the model. If not found in data
, the variables are taken from environment(formula)
, typically the environment from which sparse.sglmm
is called.
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL
or a numeric vector of length equal to the number of cases. One or more offset
terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset
.
the adjacency matrix for the underlying graph.
the basis to use. The options are Bayesian spatial filtering (“BSF
”) and restricted spatial regression (“RSR
”).
the number of attractive Moran eigenvectors to use. The default is 50. See `Details' for more information.
the number of repulsive Moran eigenvectors to use. The default is 0. See `Details' for more information.
a tolerance. If all Monte Carlo standard errors are smaller than tol
, no more samples are drawn from the posterior. The default is 0.01.
the minimum sample size. This should be large enough to permit accurate estimation of Monte Carlo standard errors. The default is 10,000.
the maximum sample size. Sampling from the posterior terminates when all Monte Carlo standard errors are smaller than tol
or when maxit
samples have been drawn, whichever happens first. The default is 1,000,000.
(where relevant) a list containing sigma.s
, sigma.h
, and sigma.theta
. These are the standard deviations for the \(\gamma\), \(\delta\), and \(\theta\) proposals, respectively.
a list containing sigma.b
, the prior standard deviation for \(\beta\), and (where relevant) a.h
and b.h
, the parameters of the gamma prior for \(\tau_h\).
a logical value indicating whether the model frame should be included as a component of the returned value.
a logical value indicating whether the model matrix used in the fitting process should be returned as a component of the returned value.
a logical value indicating whether the response vector used in the fitting process should be returned as a component of the returned value.
a logical value indicating whether to print MCMC progress to the screen. Defaults to FALSE
.
(for parallelized computation of the Moran operator) a list containing type
and nodes
, the cluster type and number of slave nodes, respectively. The former must be one of “FORK
”, “MPI
”, “NWS
”, “PSOCK
”, or “SOCK
” (default). The latter must be a whole number greater than 1. This argument defaults to FALSE
, in which case the matrix multiplications are not parallelized.
sparse.sglmm
returns an object of class “sparse.sglmm
”, which is a list containing the following components.
the estimated regression coefficients.
the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.
the linear fit on link scale.
the response residuals.
the size of the posterior sample.
an iter
by \(p\) matrix containing the posterior samples for \(\beta\).
an iter
by \(q\) matrix containing the posterior samples for \(\gamma\).
(where relevant) an iter
by \(q\) matrix containing the posterior samples for \(\delta\).
(where relevant) a vector containing the posterior samples for \(\theta\).
a vector containing the posterior samples for \(\tau_s\).
(where relevant) a vector containing the posterior samples for \(\tau_h\).
the estimate of \(\gamma\).
(where relevant) the estimate of \(\delta\).
the estimate of \(\tau_s\).
(where relevant) the estimate of \(\tau_h\).
(where relevant) the estimate of \(\theta\).
the Monte Carlo standard errors for \(\beta\).
the Monte Carlo standard errors for \(\gamma\).
(where relevant) the Monte Carlo standard errors for \(\delta\).
the Monte Carlo standard error for \(\tau_s\).
(where relevant) the Monte Carlo standard error for \(\tau_h\).
(where relevant) the Monte Carlo standard error for \(\theta\).
the goodness of fit component of the DIC.
the penalty component of the DIC.
the deviance information criterion.
the acceptance rate for \(\beta\).
the acceptance rate for \(\gamma\).
(where relevant) the acceptance rate for \(\delta\).
(where relevant) the acceptance rate for \(\theta\).
if requested (the default), the y
vector used.
if requested, the model matrix.
if requested, the matrix of Moran eigenvectors.
if requested, the spectrum of the Moran operator.
a list containing the names and values of the hyperparameters.
a list containing the names and values of the tuning parameters.
if requested (the default), the model frame.
the matched call.
the formula supplied.
the terms
object used.
the data
argument.
the offset vector used.
(where relevant) a record of the levels of the factors used in fitting.
This function fits the sparse restricted spatial regression model of Hughes and Haran (2013), or the Bayesian spatial filtering model of Hughes (2017). The first stage of the model is $$g(\mu_i)=x_i^\prime\beta+m_i^\prime\gamma\hspace{1 cm}(i=1,\dots,n)$$ or, in vectorized form, $$g(\mu)=X\beta+M\gamma,$$ where \(X\) is the design matrix, \(\beta\) is a \(p\)-vector of regression coefficients, the columns of \(M\) are \(q\) eigenvectors of the Moran operator, and \(\gamma\) are spatial random effects. Arguments attractive
and repulsive
can be used to control the number of eigenvectors used. The default values are 50 and 0, respectively, which corresponds to pure spatial smoothing. Inclusion of some repulsive eigenvectors can be advantageous in certain applications. The second stage, i.e., the prior for \(\gamma\), is $$p(\gamma\mid\tau_s)\propto\tau_s^{q/2}\exp\left(-\frac{\tau_s}{2}\gamma^\prime M^\prime QM\gamma\right),$$ where \(\tau_s\) is a smoothing parameter and \(Q\) is the graph Laplacian. The prior for \(\beta\) is spherical \(p\)-variate normal with mean zero and common standard deviation sigma.b
, which defaults to 1,000. The prior for \(\tau_s\) is gamma with parameters 0.5 and 2,000. The same prior is used for \(\theta\) (when family is negbinomial
). When the response is normally distributed, the identity link is assumed, in which case the first stage is $$\mu=X\beta+M\gamma+M\delta,$$ where \(\delta\) are heterogeneity random effects. When the response is Poisson distributed, heterogeneity random effects are optional. In any case, the prior on \(\delta\) is spherical \(q\)-variate normal with mean zero and common variance \(1/\tau_h\). The prior for \(\tau_h\) is gamma with parameters \(a_h\) and \(b_h\), the values of which are controlled by the user through argument hyper
. If the response is Bernoulli, negative binomial, or Poisson, \(\beta\) and \(\gamma\) are updated using Metropolis-Hastings random walks with normal proposals. The proposal covariance matrix for \(\beta\) is the estimated asymptotic covariance matrix from a glm
fit to the data (see vcov
). The proposal for \(\gamma\) is spherical normal with common standard deviation sigma.s
. The updates for \(\tau_s\) and \(\tau_h\) are Gibbs updates irrespective of the response distribution. If the response is Poisson distributed and heterogeneity random effects are included, those random effects are updated using a Metropolis-Hastings random walk with a spherical normal proposal. The common standard deviation is sigma.h
. If the response is normally distributed, all updates are Gibbs updates. If the response is negative binomial, the dispersion parameter \(\theta\) is updated using a Metropolis-Hastings random walk with a normal proposal. Said proposal has standard deviation sigma.theta
, which can be provided by the user as an element of argument tune
.
Hughes, J. and Haran, M. (2013) Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. Journal of the Royal Statistical Society, Series B, 75(1), 139--159.
residuals.sparse.sglmm
, summary.sparse.sglmm
, vcov.sparse.sglmm
# NOT RUN {
The following code duplicates the analysis described in (Hughes and Haran, 2013). The data are
infant mortality data for 3,071 US counties. We do a spatial Poisson regression with offset.
data(infant)
infant$low_weight = infant$low_weight / infant$births
attach(infant)
Z = deaths
X = cbind(1, low_weight, black, hispanic, gini, affluence, stability)
data(A)
set.seed(123456)
fit = sparse.sglmm(Z ~ X - 1 + offset(log(births)), family = poisson, A = A, method = "RSR",
tune = list(sigma.s = 0.02), verbose = TRUE)
summary(fit)
# }
Run the code above in your browser using DataLab