eco: Fitting the Parametric Bayesian Model of Ecological Inference in 2x2 Tables

Description

eco is used to fit the parametric Bayesian model (based on a Normal/Inverse-Wishart prior) for ecological inference in \(2 \times 2\) tables via Markov chain Monte Carlo. It gives the in-sample predictions as well as the estimates of the model parameters. The model and algorithm are described in Imai, Lu and Strauss (2008, 2011).

Usage

eco(
  formula,
  data = parent.frame(),
  N = NULL,
  supplement = NULL,
  context = FALSE,
  mu0 = 0,
  tau0 = 2,
  nu0 = 4,
  S0 = 10,
  mu.start = 0,
  Sigma.start = 10,
  parameter = TRUE,
  grid = FALSE,
  n.draws = 5000,
  burnin = 0,
  thin = 0,
  verbose = FALSE
)

Value

An object of class eco containing the following elements:

call: The matched call.
X: The row margin, \(X\).
Y: The column margin, \(Y\).
N: The size of each table, \(N\).
burnin: The number of initial burnin draws.
thin: The thinning interval.
nu0: The prior degrees of freedom.
tau0: The prior scale parameter.
mu0: The prior mean.
S0: The prior scale matrix.
W: A three dimensional array storing the posterior in-sample predictions of \(W\). The first dimension indexes the Monte Carlo draws, the second dimension indexes the columns of the table, and the third dimension represents the observations.
Wmin: A numeric matrix storing the lower bounds of \(W\).
Wmax: A numeric matrix storing the upper bounds of \(W\).

The following additional elements are included in the output when parameter = TRUE.

mu: The posterior draws of the population mean parameter, \(\mu\).
Sigma: The posterior draws of the population variance matrix, \(\Sigma\).

Arguments

formula: A symbolic description of the model to be fit, specifying the column and row margins of \(2 \times 2\) ecological tables. Y ~ X specifies Y as the column margin (e.g., turnout) and X as the row margin (e.g., percent African-American). Details and specific examples are given below.
data: An optional data frame in which to interpret the variables in formula. The default is the environment in which eco is called.
N: An optional variable representing the size of the unit; e.g., the total number of voters. N needs to be a vector of same length as Y and X or a scalar.
supplement: An optional matrix of supplemental data. The matrix has two columns, which contain additional individual-level data such as survey data for \(W_1\) and \(W_2\), respectively. If NULL, no additional individual-level data are included in the model. The default is NULL.
context: Logical. If TRUE, the contextual effect is also modeled, that is to assume the row margin \(X\) and the unknown \(W_1\) and \(W_2\) are correlated. See Imai, Lu and Strauss (2008, 2011) for details. The default is FALSE.
mu0: A scalar or a numeric vector that specifies the prior mean for the mean parameter \(\mu\) for \((W_1,W_2)\) (or for \((W_1, W_2, X)\) if context=TRUE). When the input of mu0 is a scalar, its value will be repeated to yield a vector of the length of \(\mu\), otherwise, it needs to be a vector of same length as \(\mu\). When context=TRUE, the length of \(\mu\) is 3, otherwise it is 2. The default is 0.
tau0: A positive integer representing the scale parameter of the Normal-Inverse Wishart prior for the mean and variance parameter \((\mu, \Sigma)\). The default is 2.
nu0: A positive integer representing the prior degrees of freedom of the Normal-Inverse Wishart prior for the mean and variance parameter \((\mu, \Sigma)\). The default is 4.
S0: A positive scalar or a positive definite matrix that specifies the prior scale matrix of the Normal-Inverse Wishart prior for the mean and variance parameter \((\mu, \Sigma)\) . If it is a scalar, then the prior scale matrix will be a diagonal matrix with the same dimensions as \(\Sigma\) and the diagonal elements all take value of S0, otherwise S0 needs to have same dimensions as \(\Sigma\). When context=TRUE, \(\Sigma\) is a \(3 \times 3\) matrix, otherwise, it is \(2 \times 2\). The default is 10.
mu.start: A scalar or a numeric vector that specifies the starting values of the mean parameter \(\mu\). If it is a scalar, then its value will be repeated to yield a vector of the length of \(\mu\), otherwise, it needs to be a vector of same length as \(\mu\). When context=FALSE, the length of \(\mu\) is 2, otherwise it is 3. The default is 0.
Sigma.start: A scalar or a positive definite matrix that specified the starting value of the variance matrix \(\Sigma\). If it is a scalar, then the prior scale matrix will be a diagonal matrix with the same dimensions as \(\Sigma\) and the diagonal elements all take value of S0, otherwise S0 needs to have same dimensions as \(\Sigma\). When context=TRUE, \(\Sigma\) is a \(3 \times 3\) matrix, otherwise, it is \(2 \times 2\). The default is 10.
parameter: Logical. If TRUE, the Gibbs draws of the population parameters, \(\mu\) and \(\Sigma\), are returned in addition to the in-sample predictions of the missing internal cells, \(W\). The default is TRUE.
grid: Logical. If TRUE, the grid method is used to sample \(W\) in the Gibbs sampler. If FALSE, the Metropolis algorithm is used where candidate draws are sampled from the uniform distribution on the tomography line for each unit. Note that the grid method is significantly slower than the Metropolis algorithm. The default is FALSE.
n.draws: A positive integer. The number of MCMC draws. The default is 5000.
burnin: A positive integer. The burnin interval for the Markov chain; i.e. the number of initial draws that should not be stored. The default is 0.
thin: A positive integer. The thinning interval for the Markov chain; i.e. the number of Gibbs draws between the recorded values that are skipped. The default is 0.
verbose: Logical. If TRUE, the progress of the Gibbs sampler is printed to the screen. The default is FALSE.

Details

An example of \(2 \times 2\) ecological table for racial voting is given below:

	black voters	white voters
vote	\(W_{1i}\)	\(W_{2i}\)	\(Y_i\)
not vote	\(1-W_{1i}\)	\(1-W_{2i}\)	\(1-Y_i\)
	\(X_i\)	\(1-X_i\)

where \(Y_i\) and \(X_i\) represent the observed margins, and \(W_1\) and \(W_2\) are unknown variables. In this exmaple, \(Y_i\) is the turnout rate in the ith precint, \(X_i\) is the proproption of African American in the ith precinct. The unknowns \(W_{1i}\) an d\(W_{2i}\) are the black and white turnout, respectively. All variables are proportions and hence bounded between 0 and 1. For each \(i\), the following deterministic relationship holds, \(Y_i=X_i W_{1i}+(1-X_i)W_{2i}\).

References

Imai, Kosuke, Ying Lu and Aaron Strauss. (2011). “eco: R Package for Ecological Inference in 2x2 Tables” Journal of Statistical Software, Vol. 42, No. 5, pp. 1-23.

Imai, Kosuke, Ying Lu and Aaron Strauss. (2008). “Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data Approach” Political Analysis, Vol. 16, No. 1 (Winter), pp. 41-69.

Examples

Run this code



## load the registration data
data(reg)

## NOTE: convergence has not been properly assessed for the following
## examples. See Imai, Lu and Strauss (2008, 2011) for more
## complete analyses.

## fit the parametric model with the default prior specification
res <- eco(Y ~ X, data = reg, verbose = TRUE)
## summarize the results
summary(res)

## obtain out-of-sample prediction
out <- predict(res, verbose = TRUE)
## summarize the results
summary(out)

## load the Robinson's census data
data(census)

## fit the parametric model with contextual effects and N 
## using the default prior specification
res1 <- eco(Y ~ X, N = N, context = TRUE, data = census, verbose = TRUE)
## summarize the results
summary(res1)

## obtain out-of-sample prediction
out1 <- predict(res1, verbose = TRUE)
## summarize the results
summary(out1)

Run the code above in your browser using DataLab