Last chance! 50% off unlimited learning
Sale ends in
idealstan
modelThis function will take a pre-processed idealdata
vote/score dataframe and
run one of the available IRT/latent space ideal point models on the data using
Stan's MCMC engine.
id_estimate(idealdata = NULL, model_type = 2, inflate_zero = FALSE,
vary_ideal_pts = "none", use_subset = FALSE, sample_it = FALSE,
subset_group = NULL, subset_person = NULL, sample_size = 20,
nchains = 4, niters = 2000, use_vb = FALSE,
restrict_ind_high = NULL, id_diff = 4, id_diff_high = 2,
restrict_ind_low = NULL, fixtype = "vb_full", id_refresh = 0,
prior_fit = NULL, warmup = floor(niters/2), ncores = 4,
use_groups = FALSE, discrim_reg_sd = 2, discrim_miss_sd = 2,
person_sd = 1, time_sd = 0.1, sample_stationary = FALSE,
ar_sd = 2, diff_reg_sd = 1, diff_miss_sd = 1, restrict_sd = 0.01,
restrict_mean = NULL, restrict_var = NULL,
restrict_mean_val = NULL, restrict_mean_ind = NULL,
restrict_var_high = 0.1, tol_rel_obj = 0.001, gp_sd_par = 0.025,
gp_num_diff = c(3, 0.01), gp_m_sd_par = c(0.3, 10),
gp_min_length = 0, ...)
An object produced by the id_make
containing a score/vote matrix for use for estimation & plotting
An integer reflecting the kind of model to be estimated. See below.
If the outcome is distributed as Poisson (count/unbounded integer),
setting this to
TRUE
will fit a traditional zero-inflated model. To use correctly, the value for
zero must be passed as the miss_val
option to id_make
before
running a model so that zeroes are coded as missing data.
Default 'none'
. If 'random_walk'
, 'AR1'
or
'GP'
, a
time-varying ideal point model will be fit with either a random-walk process, an
AR1 process or a Gaussian process. See documentation for more info.
Whether a subset of the legislators/persons should be used instead of the full response matrix
Whether or not to use a random subsample of the response matrix. Useful for testing.
If person/legislative data was included in the id_make
function, then you can subset by
any value in the $group
column of that data if use_subset
is TRUE
.
A list of character values of names of persons/legislators to use to subset if use_subset
is
TRUE
and person/legislative data was included in the id_make
function with the required $person.names
column
If sample_it
is TRUE
, this value reflects how many legislators/persons will be sampled from
the response matrix
The number of chains to use in Stan's sampler. Minimum is one. See stan
for more info.
The number of iterations to run Stan's sampler. Shouldn't be set much lower than 500. See stan
for more info.
Whether or not to use Stan's variational Bayesian inference engine instead of full Bayesian inference. Pros: it's much faster.
Cons: it's not quite as accurate. See vb
for more info.
If fixtype
is not "vb", the particular indices of legislators/persons or bills/items to constrain high
The fixed difference between the high/low person/legislator ideal points used to identify the model. Set at 4 as a standard value but can be changed to any arbitrary number without affecting model results besides re-scaling.
The fixed intercept of the high ideal point used to constrain the model.
If fixtype
is not "vb", the particular indices of legislators/persons or bills/items to constrain low.
(Note: not used if values are pinned).
Sets the particular kind of identification used on the model, could be one of 'vb_full'
(identification provided exclusively by running a variational identification model with no prior info),
'vb_partial' (two indices of ideal points to fix are provided but the values to fix are determined by the
identification model),
'constrain' (two indices of ideal points to fix are provided--only sufficient for model if restrict_var
is
FALSE
,
and 'prior_fit' (a previous identified idealstan
fit is passed to the prior_fit
option and used
as the basis for identification).
See details for more information.
The number of times to report iterations from the variational run used to identify models. Default is 0 (nothing output to console).
If a previous idealstan
model was fit with the same data, then the same
identification constraints can be recycled from the prior fit if the idealstan
object is passed
to this option. Note that means that all identification options, like restrict_var
, will also
be the same
The number of iterations to use to calibrate Stan's sampler on a given model. Shouldn't be less than 100.
See stan
for more info.
The number of cores in your computer to use for parallel processing in the Stan engine.
See stan
for more info.
If TRUE
, group parameters from the person/legis data given in id_make
will be
estimated instead of individual parameters.
Set the prior standard deviation of the bimodal prior for the discrimination parameters for the non-inflated model.
Set the prior standard deviation of the bimodal prior for the discrimination parameters for the inflated model.
Set the prior standard deviation for the legislators (persons) parameters
The precision (inverse variance) of the over-time component of the person/legislator parameters. A higher value will allow for less over-time variation (useful if estimates bounce too much). Default is 4.
If TRUE
, the AR(1) coefficients in a time-varying model will be
sampled from an unconstrained space and then mapped back to a stationary space. Leaving this TRUE
is
slower but will work better when there is limited information to identify a model. If used, the
ar_sd
parameter should be increased to 5 to allow for wider sampling in the unconstrained space.
If an AR(1) model is used, this defines the prior scale of the Normal distribution. A lower number can help identify the model when there are few time points.
Set the prior standard deviation for the bill (item) intercepts for the non-inflated model.
Set the prior standard deviation for the bill (item) intercepts for the inflated model.
Set the prior standard deviation for constrained parameters
Whether or not to restrict the over-time mean of an ideal point
(additional identification measure when standard fixes don't work). TRUE
by
default for random-walk models.
Whether to limit variance to no higher than 0.5 for random-walk time series models.
If left blank (the default), will be set to TRUE
for random-walk models and FALSE
for
AR(1) models if identification is still a challenge (note: using this for AR(1) models is
probably overkill).
For random-walk models, the mean of a time-series ideal point to constrain. Should not be set a priori (leave blank) unless you are absolutely sure. Otherwise it is set by the identification model.
For random-walk models, the ID of the person/group whose over-time mean to constrain. Should be left blank (will be set by identification model) unless you are really sure.
The upper limit for the variance parameter (if restrict_var=TRUE
&
model is a random-walk time-series). If left blank, either defaults to 0.1 or is set by
identification model.
If use_vb
is TRUE
, this parameter sets the stopping rule for the vb
algorithm.
It's default is 0.001. A stricter threshold will require the sampler to run longer but may yield a
better result in a difficult model with highly correlated parameters. Lowering the threshold should work fine for simpler
models.
The upper limit on allowed residual variation of the Gaussian process prior. Increasing the limit will permit the GP to more closely follow the time points, resulting in much sharper bends in the function and potentially oscillation.
The number of time points to use to calculate the length-scale prior that determines the level of smoothness of the GP time process. Increasing this value will result in greater smoothness/autocorrelation over time by selecting a greater number of time points over which to calculate the length-scale prior.
The upper limit of the marginal standard deviation of the GP time process. Decreasing this value will result in smoother fits.
The minimum value of the GP length-scale parameter. This is a hard
lower limit. Increasing this value will force a smoother GP fit. It should always be less than
gp_num_diff
.
Additional parameters passed on to Stan's sampling engine. See stan
for more information.
A fitted idealstan
object that contains posterior samples of all parameters either via full Bayesian inference
or a variational approximation if use_vb
is set to TRUE
. This object can then be passed to the plotting functions for further analysis.
To run an IRT ideal point model, you must first pre-process your data using the id_make
function. Be sure to specify the correct options for the
kind of model you are going to run: if you want to run an unbounded outcome (i.e. Poisson or continuous),
the data needs to be processed differently. Also any hierarchical covariates at the person or item level
need to be specified in id_make
. If they are specified in id_make
, than all
subsequent models fit by this function will have these covariates.
Note that for static ideal point models, the covariates are only defined for those persons who are not being used as constraints.
As of this version of idealstan
, the following model types are available. Simply pass
the number of the model in the list to the model_type
option to fit the model.
IRT 2-PL (binary response) ideal point model, no missing-data inflation
IRT 2-PL ideal point model (binary response) with missing- inflation
Ordinal IRT (rating scale) ideal point model no missing-data inflation
Ordinal IRT (rating scale) ideal point model with missing-data inflation
Ordinal IRT (graded response) ideal point model no missing-data inflation
Ordinal IRT (graded response) ideal point model with missing-data inflation
Poisson IRT (Wordfish) ideal point model with no missing data inflation
Poisson IRT (Wordfish) ideal point model with missing-data inflation
unbounded (Gaussian) IRT ideal point model with no missing data
unbounded (Gaussian) IRT ideal point model with missing-data inflation
Positive-unbounded (Log-normal) IRT ideal point model with no missing data
Positive-unbounded (Log-normal) IRT ideal point model with missing-data inflation
Latent Space (binary response) ideal point model with no missing data
Latent Space (binary response) ideal point model with missing-data inflation
Clinton, J., Jackman, S., & Rivers, D. (2004). The Statistical Analysis of Roll Call Data. The American Political Science Review, 98(2), 355-370. doi:10.1017/S0003055404001194
Bafumi, J., Gelman, A., Park, D., & Kaplan, N. (2005). Practical Issues in Implementing and Understanding Bayesian Ideal Point Estimation. Political Analysis, 13(2), 171-187. doi:10.1093/pan/mpi010
Kubinec, R. "Generalized Ideal Point Models for Time-Varying and Missing-Data Inference". Working Paper.
Betancourt, Michael. "Robust Gaussian Processes in Stan". (October 2017). Case Study.
id_make
for pre-processing data,
id_plot_legis
for plotting results,
summary
for obtaining posterior quantiles,
posterior_predict
for producing predictive replications.
# NOT RUN {
# First we can simulate data for an IRT 2-PL model that is inflated for missing data
library(ggplot2)
library(dplyr)
# This code will take at least a few minutes to run
# }
# NOT RUN {
bin_irt_2pl_abs_sim <- id_sim_gen(model_type='binary',inflate=T)
# Now we can put that directly into the id_estimate function
# to get full Bayesian posterior estimates
# We will constrain discrimination parameters
# for identification purposes based on the true simulated values
bin_irt_2pl_abs_est <- id_estimate(bin_irt_2pl_abs_sim,
model_type=2,
restrict_ind_high =
sort(bin_irt_2pl_abs_sim@simul_data$true_person,
decreasing=TRUE,
index=TRUE)$ix[1],
restrict_ind_low =
sort(bin_irt_2pl_abs_sim@simul_data$true_person,
decreasing=FALSE,
index=TRUE)$ix[1],
fixtype='vb_partial',
ncores=2,
nchains=2)
# We can now see how well the model recovered the true parameters
id_sim_coverage(bin_irt_2pl_abs_est) %>%
bind_rows(.id='Parameter') %>%
ggplot(aes(y=avg,x=Parameter)) +
stat_summary(fun.args=list(mult=1.96)) +
theme_minimal()
# }
# NOT RUN {
# In most cases, we will use pre-existing data
# and we will need to use the id_make function first
# We will use the full rollcall voting data
# from the 114th Senate as a rollcall object
data('senate114')
# Running this model will take at least a few minutes, even with
# variational inference (use_vb=T) turned on
# }
# NOT RUN {
to_idealstan <- id_make(score_data = senate114,
outcome = 'cast_code',
person_id = 'bioname',
item_id = 'rollnumber',
group_id= 'party_code',
time_id='date',
high_val='Yes',
low_val='No',
miss_val='Absent')
sen_est <- id_estimate(to_idealstan,
model_type = 2,
use_vb = TRUE,
fixtype='vb_partial',
restrict_ind_high = "BARRASSO, John A.",
restrict_ind_low = "WARREN, Elizabeth")
# After running the model, we can plot
# the results of the person/legislator ideal points
id_plot_legis(sen_est)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab