ideal: analysis of educational testing data and roll call data with IRT models, via Markov chain Monte Carlo methods

Description

Analysis of rollcall data via the spatial voting model; analogous to fitting educational testing data via an item-response model. Model fitting via Markov chain Monte Carlo (MCMC).

Usage

ideal(object, codes = object$codes,
      dropList = list(codes = "notInLegis", lop = 0),
      d = 1, maxiter = 10000, thin = 100, burnin = 5000,
      impute = FALSE,
      mda = TRUE,
      normalize = FALSE,
      meanzero = normalize,
      priors = NULL, startvals = "eigen",
      store.item = FALSE, file = NULL,
      verbose=FALSE)

Arguments

object

an object of class rollcall

codes

a list describing the types of voting decisions in the roll call matrix (the votes component of the rollcall object); default

dropList

a list (or alist) listing voting decisions, legislators and/or votes to be dropped from the analysis; see dropR

numeric, (small) positive integer (defaults to 1).

maxiter

numeric, positive integer, multiple of thin

thin

numeric, positive integer, thinning interval used for recording MCMC iterations.

burnin

number of MCMC iterations to run before recording. The iteration numbered burnin will be recorded. Must be a multiple of thin.

impute

logical, whether to treat missing entries of the rollcall matrix as missing at random, sampling from the predictive density of the missing entries at each MCMC iteration.

mda

logical, do marginal data augmentation (see Deatils); default is TRUE

normalize

logical, impose identification with the constraint that the ideal points have mean zero and standard deviation one. This option is only functional for unidimensional models (i.e., d=1

meanzero

to be deprecated/ignored; use normalize instead.

priors

a list of parameters (means and variances) specifying normal priors for the legislators' ideal points. The default is NULL, in which case the normal priors used have mean zero and variance 1 for the ideal points (abil

startvals

either a string naming a method for generating start values, valid options are "eigen" (the default) or "random"; or a list containing start values for legislators' ideal points and item parameters. See

store.item

logical, whether item discrimination parameters should be stored. Storing item discrimination parameters can consume a large amount of memory. These need to be stored for prediction; see

file

string, file to write MCMC output. Default is NULL, in which case MCMC output is stored in memory. Note that post-estimation commands like plot will not work unless MCMC output is stored in memory.

verbose

logical, default is FALSE, which generates relatively little output to the R console during execution.

Value

a list of class ideal with named components
nnumeric, integer, number of legislators in the analysis, after any subseting via processing the dropList.
mnumeric, integer, number of rollcalls in roll call matrix, after any subseting via processing the dropList.
dnumeric, integer, number of dimensions fitted.
xa matrix containing the MCMC samples for the ideal point of each legislator in each dimension for each iteration from burnin to maxiter, at an interval of thin. Rows of the x matrix index iterations; columns index legislators.
betaa matrix containing the MCMC samples for the item discrimination parameter for each item in each dimension, plus an intercept, for each iteration from burnin to maxiter, at an interval of thin. Rows of the beta matrix index MCMC iterations; columns index parameters.
xbara matrix containing the means of the MCMC samples for the ideal point of each legislator in each dimension, using iterations burnin to maxiter, at an interval of thin; i.e., the column means of x.
betabara matrix containing the means of the MCMC samples for the vote-specific parameters, using iterations burnin to maxiter, at an interval of thin; i.e., the column means of beta.
callan object of class call, containing the arguments passed to ideal as unevaluated expressions.

Details

The function fits a d+1 parameter item-response model to the roll call data object, so in one dimension the model reduces to the two-parameter item-response model popular in educational testing. See References. Identification: The model parameters are not identified without the user supplying some restrictions on the model parameters (e.g., translations, rotations and re-scalings of the ideal points are observationally equivalent, via offsetting transformations of the item parameters). It is the user's responsibility to impose these identifying restrictions if desired; the following brief discussion provides some guidance.

For one-dimensional models (i.e., d=1), a simple route to identification is the normalize option, which guarantees local identification (identification up to a 180 rotation of the recovered dimension). Near-degeneratespike priors (priors with arbitrarily large precisions) or the constrain.legis option on any two legislators' ideal points ensures global identification.

Identification in higher dimensions can be obtained by supplying fixed values for d+1 legislators' ideal points, provided the supplied points span a d-dimensional space (e.g., three supplied ideal points form a triangle in d=2 dimensions), via the constrain.legis option. In this case the function defaults to vague normal priors, but at each iteration the sampled ideal points are transformed back into the space of identified parameters, applying the linear transformation that maps the d+1 fixed ideal points from their sampled values to their fixed values. Alternatively (and equivalently), one can impose restrictions on the item parameters via constrain.items. See the examples in the documentation for the constrain.legis and constrain.items.

Another route to identification is via post-processing. That is, the user can run ideal without any identification constraints (which does not pose any formal/technical problem in a Bayesian analysis -- the posterior density is still well defined and can be explored via MCMC methods) -- but then use the function postProcess to map the MCMC output from the space of unidentified parameters into the subspace of identified parameters. See the example in the documentation for the postProcess function. When the normalize option is set to TRUE, an unidentified model is run, and the ideal object is post-processed with the normalize option, and then returned to the user (but again, note that the normalize option is only implemented for unidimensional models). Start values. Start values can be supplied by the user, or generated by the function itself.

The default method, corresponding to startvals="eigen", first forms a n-by-n correlation matrix from the double-centered roll call matrix (subtracting row means, and column means, adding in the grand mean), and then extracts the first d principal components (eigenvectors), scaling the eigenvectors by the square root of their corresponding eigenvector. If the user is imposing constraints on ideal points (via constrain.legis), these are applied to the corresponding elements of the start values generated from the eigen decomposition. Then, to generate start values for the rollcall/item parameters, a series of binomial glms are estimated (with a probit link), one for each rollcall/item, $j = 1, \ldots, m$. The votes on the $j$-th rollcall/item are binary responses (presumed to be conditionally independent given each legislator's latent preference), and the (constrained or unconstrained) start values for legislators are used as predictors. The estimated coefficients from these probit models are stored to serve as start values for the item discrimination and difficulty parameters (with the intercepts from the probit GLMs multiplied by -1 so as to make those coefficients difficulty parameters).

The default eigen method generates extremely good start values for low-dimensional models fit to recent U.S. congresses (where high rates of party line voting mean low dimensional models fit well). The eigen method may be computationally expensive or even impossible to implement for rollcall objects with large numbers of legislators.

The random method generates start values via iid sampling from a N(0,1) density, via rnorm, imposes any constraints that may have been supplied via constrain.legis, and then uses the probit method described above to get start values for the rollcall/item parameters. If startvals is a list, it must contain the components x and/or b, each of which should be matrices. The component x must be of dimensions equal to the number of individuals (legislators) by d. If supplied, startvals$b must be of dimensions number of items (votes) by d+1. The x and b components cannot contain NA. If x is not supplied when startvals is a list, then start values are generated using the default eiegn method described above, and start values for the rollcall/item parameters are regenerated using the probit method, ignoring any user-supplied values in startvals$b. That is, user-supplied values in startvals$b are only used when accompanied by a valid set of start values for the ideal points in startvals$x.

Implementation via Data Augmentation. The MCMC algorithm for this problem consists of a Gibbs sampler for the ideal points (latent traits) and item parameters, conditional on latent data $y^*$, generated via a data augmentation (DA) step. That is, following Albert (1992) and Albert and Chib (1993), if $y_{ij} = 1$ we sample from the truncated normal density $$y_{ij}^* \sim N(x_i' \beta_j - \alpha_j, 1)\mathcal{I}(y_{ij}^* \geq 0)$$ and for $y_{ij}=0$ we sample $$y_{ij}^* \sim N(x_i' \beta_j - \alpha_j, 1)\mathcal{I}(y_{ij}^* < 0)$$ where $\mathcal{I}$ is an indicator function evaluating to one if its argument is true and zero otherwise. Given the latent $y^*$, the conditional distributions for $x$ and $(\beta,\alpha)$ are extremely simple to sample from; see the references for details.

This Gibbs-plus-DA strategy is easily implemented, but can sometimes require many thousands of samples in order to generate tolerable explorations of the marginal posterior densities of the latent traits, particularly for legislators with short and/or extreme voting histories (the equivalent in the educational testing setting is a test-taker who gets many items right or wrong). The MCMC algorithm can generate better performance via a parameter expansion strategy usually referred to as marginal data augmentation (e.g., van Dyk and Meng 2001). The idea is to introduce a additional working parameter into the MCMC sampler that has the effect of improving the performance of the sampler in the sub-space of parameters of direct interest. In this case we introduce a variance parameter $\sigma^2$ for the latent data; in the DA algorithm of Albert and Chib (1993) this parameter is set to 1.0 for identification. In the MDA approach we carry this (unidentified) parameter into the DA stage of the algorithm with an improper prior, $p(\sigma^2) \propto \sigma^{-2}$, generating $y^*$ that exhibit bigger moves from iteration to iteration, such that in turn the MCMC algorithm displays better mixing with respect to the identified parameters of direct interest, $x$ and $(\beta,\alpha)$, than the performance obtained from the Gibbs-with-DA MCMC algorithm. The MDA algorithm is the default in ideal, but Gibbs-with-DA can be implemented by setting mda=FALSE in the call to ideal.

References

Albert, James. 1992. Bayesian Estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational Statistics. 17:251-269.

Albert, James H. and Siddhartha Chib. 1993. Bayesian Analysis of Binary and Polychotomous Response Data. Journal of the American Statistical Association. 88:669-679.

Clinton, Joshua, Simon Jackman and Douglas Rivers. 2004. The Statistical Analysis of Roll Call Data. American Political Science Review. 98:335-370.

Jackman, Simon. 2009. Bayesian Analysis for the Social Sciences. Wiley: Hoboken, New Jersey.

Patz, Richard J. and Brian W. Junker. 1999. A Straightforward Approach to Markov Chain Monte Carlo Methods for Item Response Models. Journal of Education and Behavioral Statistics. 24:146-178. Rivers, Douglas. 2003. Identification of Multidimensional Item-Response Models. Typescript. Department of Political Science, Stanford University.

van Dyk, David A and Xiao-Li Meng. 2001. The art of data augmentation (with discussion). Journal of Computational and Graphical Statistics. 10(1):1-111.

Examples

Run this code

data(s109)

## ridiculously short run for examples
n <- dim(s109$legis.data)[1]
x0 <- rep(0,n)
x0[s109$legis.data$party=="D"] <- -1
x0[s109$legis.data$party=="R"] <- 1

id1 <- ideal(s109,
             d=1,
	     startvals=list(x=x0),
             normalize=TRUE,
             store.item=TRUE,
             maxiter=100,
             burnin=0,
             thin=10,
             verbose=TRUE)  
summary(id1)

## more realistic long run
idLong <- ideal(s109,
                d=1,
                priors=list(xpv=1e-12,bpv=1e-12),
                normalize=TRUE,
                store.item=TRUE,
                maxiter=260e3,
                burnin=1e4,
                thin=100)

Run the code above in your browser using DataLab