eco is used to fit the parametric Bayesian model (based on a
Normal/Inverse-Wishart prior) for ecological inference in \(2 \times 2\)
tables via Markov chain Monte Carlo. It gives the in-sample predictions as
well as the estimates of the model parameters. The model and algorithm are
described in Imai, Lu and Strauss (2008, 2011).
eco(
formula,
data = parent.frame(),
N = NULL,
supplement = NULL,
context = FALSE,
mu0 = 0,
tau0 = 2,
nu0 = 4,
S0 = 10,
mu.start = 0,
Sigma.start = 10,
parameter = TRUE,
grid = FALSE,
n.draws = 5000,
burnin = 0,
thin = 0,
verbose = FALSE
)An object of class eco containing the following elements:
The matched call.
The row margin, \(X\).
The column margin, \(Y\).
The size of each table, \(N\).
The number of initial burnin draws.
The thinning interval.
The prior degrees of freedom.
The prior scale parameter.
The prior mean.
The prior scale matrix.
A three dimensional array storing the posterior in-sample predictions of \(W\). The first dimension indexes the Monte Carlo draws, the second dimension indexes the columns of the table, and the third dimension represents the observations.
A numeric matrix storing the lower bounds of \(W\).
A numeric matrix storing the upper bounds of \(W\).
The
following additional elements are included in the output when
parameter = TRUE.
The posterior draws of the population mean parameter, \(\mu\).
The posterior draws of the population variance matrix, \(\Sigma\).
A symbolic description of the model to be fit, specifying the
column and row margins of \(2 \times 2\) ecological tables. Y ~ X
specifies Y as the column margin (e.g., turnout) and X as the
row margin (e.g., percent African-American). Details and specific examples
are given below.
An optional data frame in which to interpret the variables in
formula. The default is the environment in which eco is
called.
An optional variable representing the size of the unit; e.g., the
total number of voters. N needs to be a vector of same length as
Y and X or a scalar.
An optional matrix of supplemental data. The matrix has
two columns, which contain additional individual-level data such as survey
data for \(W_1\) and \(W_2\), respectively. If NULL, no
additional individual-level data are included in the model. The default is
NULL.
Logical. If TRUE, the contextual effect is also
modeled, that is to assume the row margin \(X\) and the unknown \(W_1\)
and \(W_2\) are correlated. See Imai, Lu and Strauss (2008, 2011) for
details. The default is FALSE.
A scalar or a numeric vector that specifies the prior mean for
the mean parameter \(\mu\) for \((W_1,W_2)\) (or for \((W_1, W_2, X)\)
if context=TRUE). When the input of mu0 is a scalar, its value
will be repeated to yield a vector of the length of \(\mu\), otherwise, it
needs to be a vector of same length as \(\mu\). When context=TRUE,
the length of \(\mu\) is 3, otherwise it is 2. The default is 0.
A positive integer representing the scale parameter of the
Normal-Inverse Wishart prior for the mean and variance parameter \((\mu,
\Sigma)\). The default is 2.
A positive integer representing the prior degrees of freedom of
the Normal-Inverse Wishart prior for the mean and variance parameter
\((\mu, \Sigma)\). The default is 4.
A positive scalar or a positive definite matrix that specifies the
prior scale matrix of the Normal-Inverse Wishart prior for the mean and
variance parameter \((\mu, \Sigma)\) . If it is a scalar, then the prior
scale matrix will be a diagonal matrix with the same dimensions as
\(\Sigma\) and the diagonal elements all take value of S0,
otherwise S0 needs to have same dimensions as \(\Sigma\). When
context=TRUE, \(\Sigma\) is a \(3 \times 3\) matrix, otherwise,
it is \(2 \times 2\). The default is 10.
A scalar or a numeric vector that specifies the starting
values of the mean parameter \(\mu\). If it is a scalar, then its value
will be repeated to yield a vector of the length of \(\mu\), otherwise, it
needs to be a vector of same length as \(\mu\). When
context=FALSE, the length of \(\mu\) is 2, otherwise it is 3. The
default is 0.
A scalar or a positive definite matrix that specified the
starting value of the variance matrix \(\Sigma\). If it is a scalar, then
the prior scale matrix will be a diagonal matrix with the same dimensions as
\(\Sigma\) and the diagonal elements all take value of S0,
otherwise S0 needs to have same dimensions as \(\Sigma\). When
context=TRUE, \(\Sigma\) is a \(3 \times 3\) matrix, otherwise,
it is \(2 \times 2\). The default is 10.
Logical. If TRUE, the Gibbs draws of the population
parameters, \(\mu\) and \(\Sigma\), are returned in addition to the
in-sample predictions of the missing internal cells, \(W\). The default is
TRUE.
Logical. If TRUE, the grid method is used to sample
\(W\) in the Gibbs sampler. If FALSE, the Metropolis algorithm is
used where candidate draws are sampled from the uniform distribution on the
tomography line for each unit. Note that the grid method is significantly
slower than the Metropolis algorithm. The default is FALSE.
A positive integer. The number of MCMC draws. The default is
5000.
A positive integer. The burnin interval for the Markov chain;
i.e. the number of initial draws that should not be stored. The default is
0.
A positive integer. The thinning interval for the Markov chain;
i.e. the number of Gibbs draws between the recorded values that are skipped.
The default is 0.
Logical. If TRUE, the progress of the Gibbs sampler is
printed to the screen. The default is FALSE.
An example of \(2 \times 2\) ecological table for racial voting is given below:
| black voters | white voters | |||
| vote | \(W_{1i}\) | \(W_{2i}\) | \(Y_i\) | |
| not vote | \(1-W_{1i}\) | \(1-W_{2i}\) | \(1-Y_i\) | |
| \(X_i\) | \(1-X_i\) |
where \(Y_i\) and \(X_i\) represent the observed margins, and \(W_1\) and \(W_2\) are unknown variables. In this exmaple, \(Y_i\) is the turnout rate in the ith precint, \(X_i\) is the proproption of African American in the ith precinct. The unknowns \(W_{1i}\) an d\(W_{2i}\) are the black and white turnout, respectively. All variables are proportions and hence bounded between 0 and 1. For each \(i\), the following deterministic relationship holds, \(Y_i=X_i W_{1i}+(1-X_i)W_{2i}\).
Imai, Kosuke, Ying Lu and Aaron Strauss. (2011). “eco: R Package for Ecological Inference in 2x2 Tables” Journal of Statistical Software, Vol. 42, No. 5, pp. 1-23.
Imai, Kosuke, Ying Lu and Aaron Strauss. (2008). “Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data Approach” Political Analysis, Vol. 16, No. 1 (Winter), pp. 41-69.
ecoML, ecoNP, predict.eco, summary.eco
## load the registration data
data(reg)
## NOTE: convergence has not been properly assessed for the following
## examples. See Imai, Lu and Strauss (2008, 2011) for more
## complete analyses.
## fit the parametric model with the default prior specification
res <- eco(Y ~ X, data = reg, verbose = TRUE)
## summarize the results
summary(res)
## obtain out-of-sample prediction
out <- predict(res, verbose = TRUE)
## summarize the results
summary(out)
## load the Robinson's census data
data(census)
## fit the parametric model with contextual effects and N
## using the default prior specification
res1 <- eco(Y ~ X, N = N, context = TRUE, data = census, verbose = TRUE)
## summarize the results
summary(res1)
## obtain out-of-sample prediction
out1 <- predict(res1, verbose = TRUE)
## summarize the results
summary(out1)
Run the code above in your browser using DataLab