rollcall
data via the spatial voting model;
analogous to fitting educational testing data via an item-response
model. Model fitting via Markov chain Monte Carlo (MCMC).ideal(object, codes = object$codes,
dropList = list(codes = "notInLegis", lop = 0),
d = 1, maxiter = 10000, thin = 100, burnin = 5000,
impute = FALSE,
mda = TRUE,
normalize = FALSE,
meanzero = normalize,
priors = NULL, startvals = "eigen",
store.item = FALSE, file = NULL,
verbose=FALSE)
rollcall
thin
burnin
will be recorded. Must be a
multiple of thin
.logical
, whether to treat missing entries
of the rollcall matrix as missing at random, sampling from the
predictive density of the missing entries at each MCMC iteration.logical
, do marginal data augmentation (see
Deatils); default is TRUE
logical
, impose identification with
the constraint that the ideal points have mean zero and
standard deviation one. This option is only functional for
unidimensional models (i.e., d=1
normalize
instead.list
of parameters (means and variances)
specifying normal priors for the legislators' ideal points. The
default is NULL
, in which case the normal priors used have mean zero and
variance 1 for the ideal points (abil"eigen"
(the default) or
"random"
; or a list
containing start values for
legislators' ideal points and item parameters. See logical
, whether item discrimination
parameters should be stored. Storing item discrimination parameters
can consume a large amount of memory. These need to be stored for
prediction; see NULL
, in which case MCMC output is stored in memory. Note
that post-estimation commands like plot
will not work unless
MCMC output is stored in memory.FALSE
, which generates relatively little output to the R
console during execution.list
of class ideal
with named componentsnumeric
, integer, number of legislators in the
analysis, after any subseting via processing the dropList
.numeric
, integer, number of rollcalls in roll
call matrix, after any subseting via processing the dropList
.numeric
, integer, number of dimensions
fitted.matrix
containing the MCMC samples for
the ideal point of each legislator in each dimension for each
iteration from burnin
to maxiter
, at an interval of
thin
. Rows of the x
matrix index iterations; columns
index legislators.matrix
containing the MCMC samples
for the item discrimination parameter for each item in each
dimension, plus an intercept, for each iteration from burnin
to maxiter
, at an interval of thin
. Rows of the
beta
matrix index MCMC iterations; columns index parameters.matrix
containing the means of the
MCMC samples for the ideal point of each legislator in each dimension,
using iterations burnin
to maxiter
, at an interval of
thin
; i.e., the column means of x
.matrix
containing the means of
the MCMC samples for the vote-specific parameters, using iterations
burnin
to maxiter
, at an interval of thin
;
i.e., the column means of beta
.call
, containing
the arguments passed to ideal
as unevaluated expressions.d
+1 parameter item-response model to
the roll call data object, so in one dimension the model reduces
to the two-parameter item-response model popular in educational testing.
See References.
Identification: The model parameters are not
identified without the user supplying some restrictions on the
model parameters (e.g., translations, rotations and re-scalings of
the ideal points are observationally equivalent, via offsetting
transformations of the item parameters). It is the user's
responsibility to impose these identifying restrictions if
desired; the following brief discussion provides some guidance. For one-dimensional models (i.e., d=1
), a simple route to
identification is the normalize
option, which guarantees
local identification (identification up to a 180 rotation of
the recovered dimension). Near-degenerateconstrain.legis
option on any two legislators' ideal points
ensures global identification.
Identification in higher dimensions can be obtained by supplying
fixed values for d+1
legislators' ideal points, provided the
supplied points span a d
-dimensional space (e.g., three
supplied ideal points form a triangle in d=2
dimensions), via
the constrain.legis
option. In this case the function
defaults to vague normal priors, but at each iteration the sampled
ideal points are transformed back into the space of identified
parameters, applying the linear transformation that maps the
d+1
fixed ideal points from their sampled values to their
fixed values. Alternatively (and equivalently), one can impose
restrictions on the item parameters via
constrain.items
. See the examples in the documentation
for the constrain.legis
and
constrain.items
.
Another route to identification is via post-processing. That
is, the user can run ideal
without any identification
constraints (which does not pose any formal/technical problem in a
Bayesian analysis -- the posterior density is still well defined and
can be explored via MCMC methods) -- but then use the function
postProcess
to map the MCMC output from the space of
unidentified parameters into the subspace of identified parameters.
See the example in the documentation for the
postProcess
function. When the
normalize
option is set to TRUE
, an unidentified model
is run, and the ideal
object is post-processed with the
normalize
option, and then returned to the user (but again,
note that the normalize
option is only implemented for
unidimensional models).
Start values. Start values can be supplied by the user, or
generated by the function itself.
The default method, corresponding to startvals="eigen"
, first
forms a n
-by-n
correlation matrix from the
double-centered roll call matrix (subtracting row means, and column
means, adding in the grand mean), and then extracts the first
d
principal components (eigenvectors), scaling the
eigenvectors by the square root of their corresponding eigenvector.
If the user is imposing constraints on ideal points (via
constrain.legis
), these are applied to the
corresponding elements of the start values generated from the eigen
decomposition. Then, to generate start
values for the rollcall/item parameters, a series of
binomial
glms
are
estimated (with a probit link
), one for
each rollcall/item, $j = 1, \ldots, m$. The votes on the
$j$-th rollcall/item are binary responses (presumed to be
conditionally independent given each legislator's latent
preference), and the (constrained or unconstrained) start values for
legislators are used as predictors. The estimated coefficients from
these probit models are stored to serve as start values for the item
discrimination and difficulty parameters (with the intercepts from
the probit GLMs multiplied by -1 so as to make those coefficients
difficulty parameters).
The default eigen
method generates extremely good start values
for low-dimensional models fit to recent U.S. congresses (where high
rates of party line voting mean low dimensional models fit well). The
eigen
method may be computationally expensive or even
impossible to implement for rollcall
objects with large numbers
of legislators.
The random
method generates start values via iid sampling
from a N(0,1) density, via rnorm
, imposes any
constraints that may have been supplied via
constrain.legis
, and then uses the probit method
described above to get start values for the rollcall/item
parameters.
If startvals
is a list
, it must contain the components
x
and/or b
, each of which should be matrices.
The component x
must be of dimensions equal to the number of individuals
(legislators) by d
. If supplied, startvals$b
must
be of dimensions number of items (votes) by d
+1. The
x
and b
components cannot contain NA
.
If x
is not supplied when startvals
is a list, then
start values are generated using the default eiegn
method
described above, and start values for the rollcall/item parameters are
regenerated using the probit method, ignoring any user-supplied values
in startvals$b
. That is, user-supplied values in
startvals$b
are only used when accompanied by a valid set
of start values for the ideal points in startvals$x
.
Implementation via Data Augmentation. The MCMC algorithm for this problem consists of a Gibbs sampler for the ideal points (latent traits) and item parameters, conditional on latent data $y^*$, generated via a data augmentation (DA) step. That is, following Albert (1992) and Albert and Chib (1993), if $y_{ij} = 1$ we sample from the truncated normal density $$y_{ij}^* \sim N(x_i' \beta_j - \alpha_j, 1)\mathcal{I}(y_{ij}^* \geq 0)$$ and for $y_{ij}=0$ we sample $$y_{ij}^* \sim N(x_i' \beta_j - \alpha_j, 1)\mathcal{I}(y_{ij}^* < 0)$$ where $\mathcal{I}$ is an indicator function evaluating to one if its argument is true and zero otherwise. Given the latent $y^*$, the conditional distributions for $x$ and $(\beta,\alpha)$ are extremely simple to sample from; see the references for details.
This Gibbs-plus-DA strategy is easily implemented, but can sometimes
require many thousands of samples in order to generate tolerable
explorations of the marginal posterior densities of the latent
traits, particularly for legislators with
short and/or extreme voting histories (the equivalent in the
educational testing setting is a test-taker who gets many items
right or wrong). The MCMC algorithm can generate better performance
via a parameter expansion strategy usually referred to as marginal
data augmentation (e.g., van Dyk and Meng 2001). The idea is to
introduce a additional working parameter into the MCMC sampler that
has the effect of improving the performance of the sampler in the
sub-space of parameters of direct interest. In this case we
introduce a variance parameter $\sigma^2$ for the latent data;
in the DA algorithm of Albert and Chib (1993) this parameter is set
to 1.0 for identification. In the MDA approach we carry this
(unidentified) parameter into the DA stage of the algorithm with an
improper prior, $p(\sigma^2) \propto \sigma^{-2}$,
generating $y^*$ that exhibit bigger moves from iteration to
iteration, such that in turn the MCMC algorithm displays better
mixing with respect to the identified parameters of direct interest,
$x$ and $(\beta,\alpha)$, than the performance obtained from
the Gibbs-with-DA MCMC algorithm. The MDA algorithm is the default
in ideal
, but Gibbs-with-DA can be implemented by setting
mda=FALSE
in the call to ideal
.
Albert, James H. and Siddhartha Chib. 1993. Bayesian Analysis of Binary and Polychotomous Response Data. Journal of the American Statistical Association. 88:669-679.
Clinton, Joshua, Simon Jackman and Douglas Rivers. 2004. The Statistical Analysis of Roll Call Data. American Political Science Review. 98:335-370.
Jackman, Simon. 2009. Bayesian Analysis for the Social Sciences. Wiley: Hoboken, New Jersey.
Patz, Richard J. and Brian W. Junker. 1999. A Straightforward
Approach to Markov Chain Monte Carlo Methods for Item Response
Models. Journal of Education and Behavioral
Statistics. 24:146-178.
Rivers, Douglas. 2003.
van Dyk, David A and Xiao-Li Meng. 2001. The art of data augmentation (with discussion). Journal of Computational and Graphical Statistics. 10(1):1-111.
rollcall
, summary.ideal
,
plot.ideal
, predict.ideal
.
tracex
for graphical display of MCMC iterative
history. idealToMCMC
converts the MCMC iterates in an
ideal
object to a form that can be used by the coda
library.
constrain.items
and
constrain.legis
for implementing identifying
restrictions.
postProcess
for imposing identifying restrictions
ex post.
MCMCirt1d
and
MCMCirtKd
in the MCMCpack
package provide similar functionality to ideal
.
data(s109)
## ridiculously short run for examples
n <- dim(s109$legis.data)[1]
x0 <- rep(0,n)
x0[s109$legis.data$party=="D"] <- -1
x0[s109$legis.data$party=="R"] <- 1
id1 <- ideal(s109,
d=1,
startvals=list(x=x0),
normalize=TRUE,
store.item=TRUE,
maxiter=100,
burnin=0,
thin=10,
verbose=TRUE)
summary(id1)
## more realistic long run
idLong <- ideal(s109,
d=1,
priors=list(xpv=1e-12,bpv=1e-12),
normalize=TRUE,
store.item=TRUE,
maxiter=260e3,
burnin=1e4,
thin=100)
Run the code above in your browser using DataLab