ecoML is used to fit parametric models for ecological
inference in $2 \times 2$ tables via Expectation Maximization (EM)
algorithms. The data is specified in proportions. At it's most basic setting, the algorithm
assumes that the individual-level proportions (i.e., $W_1$ and $W_2$) and distributed bivariate normally (after logit
transformations). The function calculates point estimates of the parameters for models
based on different assumptions. The standard errors of the point
estimates are also computed via Supplemented EM algorithms. Moreover,
ecoML quantifies the amount of missing information associated
with each parameter and allows researcher to examine the impact of
missing information on parameter estimation in ecological
inference. The models and algorithms are described in Imai,
Lu and Strauss (Forthcoming).ecoML(formula, data = parent.frame(), N = NULL, supplement = NULL,
theta.start = c(0,0,1,1,0), fix.rho = FALSE,
context = FALSE, sem = TRUE, epsilon = 10^(-10),
maxit = 1000, loglik = TRUE, hyptest = FALSE, verbose = FALSE)Y ~ X specifies Y as the
column margin (e.g., turnout) and X (e.g., percent
Afriformula. The default is the environment in which
ecoML is called.NULL, no additional individual-level data are included TRUE, the correlation
(when context=TRUE) or the partial correlation (when
context=FALSE) between $W_1$ and $W_2$
is fixed through the estimation. For details, see
Imai, Lu and Strauss(2TRUE, the contextual effect is also
modeled. In this case, the row margin (i.e., X) and the individual-level rates
(i.e., $W_1$ and $W_2$) are assumed to be distributed tri-variate normally
(after logit transformationsTRUE, the standard errors of parameter
estimates are estimated via SEM algorithm, as well as the fraction of missing data. The default is
TRUE.context = FALSE,
the elements of theta.start correspond to ($E(W_1)$,
$E(W_2)$, $var(W_1)$, $var(W_2)$,
$cor(W_1,W_2epsilon is the convergence
criterion for SEM algorithm. The default is 10^(-10).1000.TRUE, the value of the log-likelihood
function at each iteration of EM is saved. The default is
TRUE.TRUE, model is estimated under the null
hypothesis that means of $W1$ and $W2$ are the same.
The default is FALSE.TRUE, the progress of the EM and SEM
algorithms is printed to the screen. The default is FALSE.ecoML containing the following elements:context = FALSE, CAR assumption is adopted and no
contextual effect is modeled. If context = TRUE, NCAR
assumption is adopted, and contextual effect is modeled.fix.rho = TRUE, the value that $corr(W_1,
W_2)$ is fixed to.context = TRUE, $E(X)$,$cov(W_1,X)$,
$cov(W_2,X)$ are also reported.theta.em.verbose = TRUE).verbose = TRUE).sem=TRUE, ecoML also output the following
values:context=FALSE, fix.rho=TRUE,
Icom is 4 by 4. When context=FALSE, fix.rho=FALSE,
Icom is 5 by 5. When context=TRUE, Icom
is 9 by 9.Iobs is same as Icom.Icom and Iobs.
The dimension of Imiss is same as miss.Vobs is same as
Icom.Iobs is same as Icom.Vobs is same as
Icom.Imiss.SEM is TRUE, ecoML computes the observed-data
information matrix for the parameters of interest based on Supplemented-EM
algorithm. The inverse of the observed-data information matrix can be used
to estimate the variance-covariance matrix for the parameters estimated
from EM algorithms. In addition, it also computes the expected complete-data
information matrix. Based on these two measures, one can further calculate
the fraction of missing information associated with each parameter. See
Imai, Lu and Strauss (2006) for more details about fraction of missing
information.
Moreover, when hytest=TRUE, ecoML allows to estimate the
parametric model under the null hypothesis that mu_1=mu_2. One
can then construct the likelihood ratio test to assess the hypothesis of
equal means. The associated fraction of missing information for the test
statistic can be also calculated. For details, see Imai, Lu
and Strauss (2006) for details.eco, ecoNP, summary.ecoML## load the census data
data(census)
## NOTE: convergence has not been properly assessed for the following
## examples. See Imai, Lu and Strauss (2006) for more complete analyses.
## In the first example below, in the interest of time, only part of the
## data set is analyzed and the convergence requirement is less stringent
## than the default setting.
## In the second example, the program is arbitrarily halted 100 iterations
## into the simulation, before convergence.
## load the Robinson's census data
data(census)
## fit the parametric model with the default model specifications
res <- ecoML(Y ~ X, data = census[1:100,],N=census[1:100,3],epsilon=10^(-6), verbose = TRUE)
## summarize the results
summary(res)
## obtain out-of-sample prediction
out <- predict(res, verbose = TRUE)
## summarize the results
summary(out)
## fit the parametric model with some individual
## level data using the default prior specification
surv <- 1:600
res1 <- ecoML(Y ~ X, context = TRUE, data = census[-surv,],
supplement = census[surv,c(4:5,1)], maxit=100, verbose = TRUE)
## summarize the results
summary(res1)Run the code above in your browser using DataLab