ecoNP: Fitting the Nonparametric Bayesian Model of Ecological Inference in 2x2 Tables

Description

ecoNP is used to fit the nonparametric Bayesian model (based on a Dirichlet process prior) for ecological inference in $2 \times 2$ tables via Markov chain Monte Carlo. It gives the in-sample predictions as well as out-of-sample predictions for population inference. The model and algorithm are described in Imai and Lu (2004). The contextual effect can also be modeled by following the strategy described in Imai and Lu (2005).

Usage

ecoNP(formula, data = parent.frame(), N = NULL, supplement = NULL,
      context = FALSE, mu0 = 0, tau0 = 2, nu0 = 4, S0 = 10, 
      alpha = NULL, a0 = 1, b0 = 0.1, parameter = FALSE, 
      grid = FALSE, n.draws = 5000, burnin = 0, thin = 0, 
      verbose = FALSE)

Arguments

formula

A symbolic description of the model to be fit, specifying the column and row margins of $2 \times 2$ ecological tables. Y ~ X specifies Y as the column margin and X as the row margin. Details and specif

data

An optional data frame in which to interpret the variables in formula. The default is the environment in which ecoNP is called.

An optional variable representing the size of the unit; e.g., the total number of voters.

supplement

An optional matrix of supplemental data. The matrix has two columns, which contain additional individual-level data such as survey data for $W_1$ and $W_2$, respectively. If NULL, no additional individual-level data are included

context

Logical. If TRUE, the contextual effect is also modeled. See Imai and Lu (2005) for details. The default is FALSE.

mu0

A scalar or a numeric vector that specifies the prior mean for the mean parameter $\mu$. If it is a scalar, then its value will be repeated to yield a vector of the length of $\mu$, otherwise, it needs to be a vector of same length as $\mu$.

tau0

A positive integer representing the prior scale for the mean parameter $\mu$. The default is 2.

nu0

A positive integer representing the prior degrees of freedom of the variance matrix $\Sigma$. the default is 4.

A postive scalar or a positive definite matrix that specifies the prior scale matrix for the variance matrix $\Sigma$. If it is a scalar, then the prior scale matrix will be a digonal matrix with the same dimensions as $\Sigma$ and the diagonal e

alpha

A positive scalar representing a user-specified fixed value of the concentration parameter, $\alpha$. If NULL, $\alpha$ will be updated at each Gibbs draw, and its prior parameters a0 and b0 need to be sp

A positive integer representing the value of shape parameter of the gamma prior distribution for $\alpha$. The default is 1.

A positive integer representing the value of the scale parameter of the gamma prior distribution for $\alpha$. The default is 0.1.

parameter

Logical. If TRUE, the Gibbs draws of the population parameters, $\mu$ and $\Sigma$, are returned in addition to the in-sample predictions of the missing internal cells, $W$. The default is FALSE. This needs to be set

grid

Logical. If TRUE, the grid method is used to sample $W$ in the Gibbs sampler. If FALSE, the Metropolis algorithm is used where candidate draws are sampled from the uniform distribution on the tomography line for each

n.draws

A positive integer. The number of MCMC draws. The default is 5000.

burnin

A positive integer. The burnin interval for the Markov chain; i.e. the number of initial draws that should not be stored. The default is 0.

thin

A positive integer. The thinning interval for the Markov chain; i.e. the number of Gibbs draws between the recorded values that are skipped. The default is 0.

verbose

Logical. If TRUE, the progress of the gibbs sampler is printed to the screen. The default is FALSE.

Value

An object of class ecoNP containing the following elements:
callThe matched call.
XThe row margin, $X$.
YThe column margin, $Y$.
burninThe number of initial burnin draws.
thinThe thinning interval.
nu0The prior degrees of freedom.
tau0The prior scale parameter.
mu0The prior mean.
S0The prior scale matrix.
a0The prior shape parameter.
b0The prior scale parameter.
WA three dimensional array storing the posterior in-sample predictions of $W$. The first dimension indexes the Monte Carlo draws, the second dimension indexes the columns of the table, and the third dimension represents the observations.
WminA numeric matrix storing the lower bounds of $W$.
WmaxA numeric matrix storing the upper bounds of $W$.
The following additional elements are included in the output when parameter = TRUE.
muA three dimensional array storing the posterior draws of the population mean parameter, $\mu$. The first dimension indexes the Monte Carlo draws, the second dimension indexes the columns of the table, and the third dimension represents the observations.
SigmaA three dimensional array storing the posterior draws of the population variance matrix, $\Sigma$. The first dimension indexes the Monte Carlo draws, the second dimension indexes the parameters, and the third dimension represents the observations.
alphaThe posterior draws of $\alpha$.
nstarThe number of clusters at each Gibbs draw.

Details

An example of $2 \times 2$ ecological table for racial voting is given below: lccc{ black voters white voters Voted $W_{1i}$ $W_{2i}$ $Y_i$ Not voted $1-W_{1i}$ $1-W_{2i}$ $1-Y_i$ $X_i$ $1-X_i$ } where $Y_i$ and $X_i$ represent the observed margins, and $W_1$ and $W_2$ are unknown variables. All variables are proportions and hence bounded between 0 and 1. For each $i$, the following deterministic relationship holds, $Y_i=X W_{1i}+(1-X_i)W_{2i}$.

References

Imai, Kosuke and Ying Lu. (2004) Parametric and Nonparametric Bayesian Models for Ecological Inference in $2 \times 2$ Tables. Proceedings of the American Statistical Association. http://www.princeton.edu/~kimai/research/einonpar.html

Imai, Kosuke and Ying Lu. (2005) An Incomplete Data Approach to Ecological Inference. Working Paper, Princeton University, available at http://www.princeton.edu/~kimai/research/einonpar.html

Examples

Run this code

## load the registration data
data(reg)

## NOTE: We set the number of MCMC draws to be a very small number in
## the following examples; i.e., convergence has not been properly
## assessed. See Imai and Lu (2004, 2005) for more complete examples.

## fit the nonparametric model to give in-sample predictions
## store the parameters to make population inference later
res <- ecoNP(Y ~ X, data = reg, n.draws = 50, param = TRUE, verbose = TRUE) 
##summarize the results
summary(res)

## obtain out-of-sample prediction
out <- predict(res, verbose = TRUE)
## summarize the results
summary(out)

## density plots of the out-of-sample predictions
par(mfrow=c(2,1))
plot(density(out[,1]), main = "W1")
plot(density(out[,2]), main = "W2")


## load the Robinson's census data
data(census)

## fit the parametric model with contextual effects and N 
## using the default prior specification
res1 <- ecoNP(Y ~ X, N = N, context = TRUE, param = TRUE, data = census,
              n.draws = 25, verbose = TRUE)
## summarize the results
summary(res1)

## out-of sample prediction 
pres1 <- predict(res1)
summary(pres1)

Run the code above in your browser using DataLab