This function obtains MCMC draws of specific small area area-level models defined by the sampling model and linking model.
BayesSAE(formula, innov = "normal", df = NULL, b = NA, spatial = FALSE, tran =
"F", prox = NULL, beta.start = NULL, theta.start = NULL, lam.start = runif(1),
prior = NULL, mcmc = 5000, burnin = 2500, thin = 5, data)
a symbolic description of the model to be fitted (of type y ~ x | z). y
is the
response variable in the sampling model while x
is the design matrix in the linking model,
and z
is the estimated variance of direct estimation in the sampling model. See Rao (2003)
for details about the sampling model and linking model in small area estimation.
distribution of innovations in the sampling model. to be chosen between "normal"
and "t".
a vector containing degrees of freedom for the t
innovation in the sampling model
if innov = "t"
.
an optional weights vector to be used in the fitting process. Number of domains must be the same as that of the direct estimators. By default is a vector of ones. See Details.
a logical variable indicating whether it's a spatial model or not.
the transformation to be taken on the responsive variable in the linking model, to be chosen
between "F"
, "log"
or "logit"
. "F"
is the default value and indicates no
transformation taken. See Details.
l * 2
matrix defining the neighbourhood matrix. See also Details.
initial values for beta's. By default is the coefficients of regression model where the response vector is directly regressed on the design matrix in the linking model.
initial values for theta's. By default is the response vector.
initial value for lambda in the spatial model. See You and Chapman (2006).
a list of objects specifying priors. See Details.
user-specified number of MCMC draws.
The number of burnin iterations for the sampler. See Gelman (2006).
the thinning interval used in the simulation. See Gelman (2006).
an optional data frame, list or environment containing variables in the model.
The function returns a object of class "BayesSAE"
containing the following components:
an mcmc object that contains the posterior sample. This object can be summarized by functions provided by the coda package
character string indicating the type of the model. For instance "UFH" indicates that it's an unmatched Fay-Herriot model while "SYC" refers to spatial You-Chapman model
response vector in the sampling model
design matrix in the linking model
variance of direct estimation in the sampling model
the acceptance rate of \(\lambda\) since draws of \(\lambda\) are generated by M-H algorithm
average deviance, defined as $$-\frac{2}{n} \sum_{i=1}^n p(y|\theta^{i})$$ where \(\theta^{i}\) donates the ith posterior draw of \(\theta\).
discrepancy between data and model depending on a point estimator for \(\theta\), defined as $$-2 p(y|\hat{\theta})$$ where \(\hat{\theta}\) is the point estimator for \(\theta\). Here we calculate the posterior mean as the point estimator
deviance information criterion, defined as 2D_avg - D_theta.hat
. Small DIC
value
indicates strong out-of-sample predictive power
a vector of length the same as number of domains provides Rao-Blackwell estimators for each area.
the original function call
Rao-Blackwellization of theta's
logical variable indicating whether the model is spatial or not
character indicating the transformation of response variable in the linking model
If it's an unmatched model, \theta_i's are generated by M-H algorithm and theta.rate as a vector of length m provides acceptance rate for each \theta_i respectively is involved
%% ~Describe the value returned %% If it is a LIST, use %% \item{comp1 }{Description of 'comp1'} %% \item{comp2 }{Description of 'comp2'} %% ...
Let \(\theta_i\) donates variable of interest for each domain i, \(x_i\) the regressors,
\(\beta\) the regression coefficient, \(v_i\) i.i.d normal innovations. If argument b
is
specified, the linking model is of the form: \(\theta_i = x_i \beta + b_i v_i\).
If tran = "log"
, the linking model will be: \(\log(\theta_i) = x_i \beta + b_i v_i\).
tran = "logit"
means that logit transformation will be taken and the model will be: \(logit(\theta_i)
= x_i \beta + b_i v_i\). Both are unmatched area level models. See Rao (2003).
The neighbourhood matrix has the ith diagonal element equal to the number of neighbours of area i, and
off-diagonal elements equal to -1
if the corresponding areas are neighbours otherwise 0
.
See You and Chapman (2006).
The ith tuple in the argument prox
indicates that area prox[i, 1]
and area prox[i, 2]
are neighbours. Duplicated tuples will be omitted. For example, if the first row of prox
is (1, 2)
while the second is (2, 1), the second row will be deleted. The two elements within each tuple are not
supposed to be the same.
Initial values are crucial to MCMC convergence. EBLUP predictors of \(\theta\)'s and \(\beta\)'s can provides good starting values for MCMC procedure.
The list prior
should include following attributes for basic Fay-Herriot model:
beta.type
: to be chosen between "non_in"
or "normal"
. If beta.type = "non_in"
,
non-informative prior would be specified for \(\beta\). Otherwise, prior for \(\beta\) would be normal
distribution.
beta.prior
: a list contains components beta0
and eps1
if beta.type = "normal"
.
As a result \(\beta\) will be distributed with mean \(\mu = \beta_0\) and covariance matrix
\(\Sigma = diag(rep(1/eps1, p))\) where p is length of \(\beta\) including the intercept term.
sigv.type
: to be chosen between "inv_gamma"
and "unif"
. \(\sigma_v^2\) is the
variance of residual in the linking model. If sigv.type = "inv_gamma"
, inverse gamma prior would be specified
for \(\sigma_v^2\). Otherwise, \(\sigma_v^2\) would be uniformly distributed.
sigv.prior
: a list containing components a0
and b0
as shape and rate parameter in the
gamma prior. if sigv.type = "inv_gamma"
. Otherwise the list should contain the eps2
component and
consequently \(\sigma_v^2\) would be uniformly distributed on (0, 1 / eps2
)
Besides, the prior
list should also include attribute sig2.prior
to specify priors for \(\sigma_i^2\)
in the You-Chapman model. See You and Chapman (2006). The sig2.prior
is also a list contains the
components ai
and bi
. Both ai
and bi
are vectors whose length are the same as
number of domains. Thus, prior for \(\sigma_i^2\) would be inverse gamma distribution with shape parameter
ai[i]
and rate parameter bi[i]
. Default value of elements in ai
and bi
are all 0.05.
You, Y. and Chapman, B. (2006) Small Area Estimation Using Area Level Models and Estimated Sampling Variances. Survey Methodology, 32: 97-103.
Rao, J. N. K. (2003) Small Area Estimation. John Wiley and Sons.
Gelman, A. and Carlin, J. B. and Stern, H. S. and Rubin, D. B. (2006). Bayesian Data Analysis, CRC Press Company.
# NOT RUN {
# load data set
data(SAIPE)
m <- length(SAIPE$SACPR)
# basic Fay-Herriort models (FH)
result <- BayesSAE(SACPR~SNAPR+CenPR+CPER|Vardir, data = SAIPE, mcmc = 5000)
# You-Chapman models (YC)
result <- BayesSAE(SACPR~SNAPR+CenPR+CPER|Vardir, data = SAIPE, mcmc = 5000, innov = "t",
df = rep(50, m))
# spatial model with unknown sampling variance (SYC)
# define the neighbourhood matrix
prox <- cbind(sample(1:51, 50, replace = TRUE), sample(1:51, 50, replace = TRUE))
prox <- prox[prox[,1] != prox[,2], ]
result <- BayesSAE(SACPR~SNAPR+CenPR+CPER|Vardir, data = SAIPE, mcmc = 5000, innov = "t",
df = rep(50, m), spatial = TRUE, prox = prox)
# Unmatched models (UFH)
result <- BayesSAE(SACPR~SNAPR+CenPR+CPER|Vardir, data = SAIPE, mcmc = 5000, tran = "log")
# }
Run the code above in your browser using DataLab