BayesSAE: MCMC Draws Based on Area-Level Models

Description

This function obtains MCMC draws of specific small area area-level models defined by the sampling model and linking model.

Usage

BayesSAE(formula, innov = "normal", df = NULL, b = NA, spatial = FALSE, tran = 
     "F", prox = NULL, beta.start = NULL, theta.start = NULL, lam.start = runif(1), 
     prior = NULL, mcmc = 5000, burnin = 2500, thin = 5, data)

Arguments

formula

a symbolic description of the model to be fitted (of type y ~ x | z). y is the response variable in the sampling model while x is the design matrix in the linking model, and z is the estimated variance of direct estimation in the sampling model. See Rao (2003) for details about the sampling model and linking model in small area estimation.

innov

distribution of innovations in the sampling model. to be chosen between "normal" and "t".

a vector containing degrees of freedom for the t innovation in the sampling model if innov = "t".

an optional weights vector to be used in the fitting process. Number of domains must be the same as that of the direct estimators. By default is a vector of ones. See Details.

spatial

a logical variable indicating whether it's a spatial model or not.

tran

the transformation to be taken on the responsive variable in the linking model, to be chosen between "F", "log" or "logit". "F" is the default value and indicates no transformation taken. See Details.

prox

l * 2 matrix defining the neighbourhood matrix. See also Details.

beta.start

initial values for beta's. By default is the coefficients of regression model where the response vector is directly regressed on the design matrix in the linking model.

theta.start

initial values for theta's. By default is the response vector.

lam.start

initial value for lambda in the spatial model. See You and Chapman (2006).

prior

a list of objects specifying priors. See Details.

mcmc

user-specified number of MCMC draws.

burnin

The number of burnin iterations for the sampler. See Gelman (2006).

thin

the thinning interval used in the simulation. See Gelman (2006).

data

an optional data frame, list or environment containing variables in the model.

Value

The function returns a object of class "BayesSAE" containing the following components:

mcmc

an mcmc object that contains the posterior sample. This object can be summarized by functions provided by the coda package

type

character string indicating the type of the model. For instance "UFH" indicates that it's an unmatched Fay-Herriot model while "SYC" refers to spatial You-Chapman model

response vector in the sampling model

design matrix in the linking model

variance of direct estimation in the sampling model

lam.rate

the acceptance rate of $\lambda$ since draws of $\lambda$ are generated by M-H algorithm

D_avg

average deviance, defined as $$-\frac{2}{n} \sum_{i=1}^n p(y|\theta^{i})$$ where $\theta^{i}$ donates the ith posterior draw of $\theta$.

D_theta.hat

discrepancy between data and model depending on a point estimator for $\theta$, defined as $$-2 p(y|\hat{\theta})$$ where $\hat{\theta}$ is the point estimator for $\theta$. Here we calculate the posterior mean as the point estimator

DIC

deviance information criterion, defined as 2D_avg - D_theta.hat. Small DIC value indicates strong out-of-sample predictive power

theta.HB

a vector of length the same as number of domains provides Rao-Blackwell estimators for each area.

call

the original function call

Rao-Blackwellization of theta's

spatial

logical variable indicating whether the model is spatial or not

tran

character indicating the transformation of response variable in the linking model

If it's an unmatched model, \theta_i's are generated by M-H algorithm and theta.rate as a vector of length m provides acceptance rate for each \theta_i respectively is involved

%% ~Describe the value returned %% If it is a LIST, use %% \item{comp1 }{Description of 'comp1'} %% \item{comp2 }{Description of 'comp2'} %% ...

Details

Let $\theta_i$ donates variable of interest for each domain i, $x_i$ the regressors, $\beta$ the regression coefficient, $v_i$ i.i.d normal innovations. If argument b is specified, the linking model is of the form: $\theta_i = x_i \beta + b_i v_i$.

If tran = "log", the linking model will be: $\log(\theta_i) = x_i \beta + b_i v_i$. tran = "logit" means that logit transformation will be taken and the model will be: $logit(\theta_i) = x_i \beta + b_i v_i$. Both are unmatched area level models. See Rao (2003).

The neighbourhood matrix has the ith diagonal element equal to the number of neighbours of area i, and off-diagonal elements equal to -1 if the corresponding areas are neighbours otherwise 0. See You and Chapman (2006).

The ith tuple in the argument prox indicates that area prox[i, 1] and area prox[i, 2] are neighbours. Duplicated tuples will be omitted. For example, if the first row of prox is (1, 2) while the second is (2, 1), the second row will be deleted. The two elements within each tuple are not supposed to be the same.

Initial values are crucial to MCMC convergence. EBLUP predictors of $\theta$'s and $\beta$'s can provides good starting values for MCMC procedure.

The list prior should include following attributes for basic Fay-Herriot model:

beta.type: to be chosen between "non_in" or "normal". If beta.type = "non_in", non-informative prior would be specified for $\beta$. Otherwise, prior for $\beta$ would be normal distribution.
beta.prior: a list contains components beta0 and eps1 if beta.type = "normal". As a result $\beta$ will be distributed with mean $\mu = \beta_0$ and covariance matrix $\Sigma = diag(rep(1/eps1, p))$ where p is length of $\beta$ including the intercept term.
sigv.type: to be chosen between "inv_gamma" and "unif". $\sigma_v^2$ is the variance of residual in the linking model. If sigv.type = "inv_gamma", inverse gamma prior would be specified for $\sigma_v^2$. Otherwise, $\sigma_v^2$ would be uniformly distributed.
sigv.prior: a list containing components a0 and b0 as shape and rate parameter in the gamma prior. if sigv.type = "inv_gamma". Otherwise the list should contain the eps2 component and consequently $\sigma_v^2$ would be uniformly distributed on (0, 1 / eps2)

Besides, the prior list should also include attribute sig2.prior to specify priors for $\sigma_i^2$ in the You-Chapman model. See You and Chapman (2006). The sig2.prior is also a list contains the components ai and bi. Both ai and bi are vectors whose length are the same as number of domains. Thus, prior for $\sigma_i^2$ would be inverse gamma distribution with shape parameter ai[i] and rate parameter bi[i]. Default value of elements in ai and bi are all 0.05.

References

You, Y. and Chapman, B. (2006) Small Area Estimation Using Area Level Models and Estimated Sampling Variances. Survey Methodology, 32: 97-103.

Rao, J. N. K. (2003) Small Area Estimation. John Wiley and Sons.

Gelman, A. and Carlin, J. B. and Stern, H. S. and Rubin, D. B. (2006). Bayesian Data Analysis, CRC Press Company.

Examples

Run this code

# NOT RUN {
# load data set
data(SAIPE)
m <- length(SAIPE$SACPR)

# basic Fay-Herriort models (FH)
result <- BayesSAE(SACPR~SNAPR+CenPR+CPER|Vardir, data = SAIPE, mcmc = 5000)

# You-Chapman models (YC)
result <- BayesSAE(SACPR~SNAPR+CenPR+CPER|Vardir, data = SAIPE, mcmc = 5000, innov = "t", 
    df = rep(50, m))
	
# spatial model with unknown sampling variance (SYC)
# define the neighbourhood matrix
prox <- cbind(sample(1:51, 50, replace = TRUE), sample(1:51, 50, replace = TRUE))
prox <- prox[prox[,1] != prox[,2], ]
result <- BayesSAE(SACPR~SNAPR+CenPR+CPER|Vardir, data = SAIPE, mcmc = 5000, innov = "t",
    df = rep(50, m), spatial = TRUE, prox = prox)

# Unmatched models (UFH)
result <- BayesSAE(SACPR~SNAPR+CenPR+CPER|Vardir, data = SAIPE, mcmc = 5000, tran = "log")	
# }

Run the code above in your browser using DataLab