id_make: Create data to run IRT model

Description

To run an IRT model using idealstan, you must first process your data using the id_make function.

Usage

id_make(score_data = NULL, outcome = "outcome",
  person_id = "person_id", item_id = "item_id", time_id = "time_id",
  group_id = "group_id", simul_data = NULL, person_cov = NULL,
  group_cov = NULL, item_cov = NULL, item_cov_miss = NULL,
  miss_val = NA, high_val = NULL, low_val = NULL,
  middle_val = NULL, unbounded = FALSE, exclude_level = NA,
  simulation = FALSE)

Arguments

score_data

A data frame in long form, i.e., one row in the data for each measured score or vote in the data or a rollcall data object from package pscl.

outcome

Column name of the outcome in score_data, default is "outcome"

person_id

Column name of the person/legislator ID index in score_data, default is 'person_id'. Should be integer, character or factor.

item_id

Column name of the item/bill ID index in score_data, default is 'item_id'. Should be integer, character or factor.

time_id

Column name of the time values in score_data: optional, default is 'time_id'. Should be a date or date-time class, but can be an integer (i.e., years in whole numbers).

group_id

Optional column name of a person/legislator group IDs (i.e., parties) in score_data. Optional, default is 'group_id'. Should be integer, character or factor.

simul_data

Optionally, data that has been generated by the id_sim_gen function.

person_cov

A one-sided formula that specifies the covariates in score_data that will be used to hierarchically model the person/legislator ideal points

group_cov

A one-sided formula that specifies the covariates in score_data that will be used to hierarchically model the person/legislator ideal points at the group level. Use this in place of person_cov if you intend to run a group-level model.

item_cov

A one-sided formula that specifies the covariates in score_data that will be used to hierarchically model the item/bill discrimination parameters for the regular model

item_cov_miss

A one-sided formula that specifies the covariates in the dataset that will be used to hierarchically model the item/bill discrimination parameters for the missing data model.

miss_val

The value (numeric or character) that indicate missing data/absences in the data. If missing data is coded as NA, simply leave this parameter at the default, NA.

high_val

The value (numeric or character) that indicate the highest discrete outcome possible, such as yes in a vote dataset or correct in a test examination.

low_val

The value (numeric or character) that indicates the lowest discrete outcome possible, such as no votes in a vote dataset or incorrect in a test examination.

middle_val

The value (numeric or character) that indicate values between the lowest and highest categories, such as abstention in voting data or "Neither Agree nor Disagree" in Likert scales. If there are multiple possible values, pass along a numeric or character vector of all such values in correct order (lower to higher values). If there are no middle values (binary outcome), leave empty.

unbounded

Whether or not the outcome/response is unbounded (i.e., continuous or Poisson). If it is, miss_val is recoded as the maximum of the outcome + 1.

exclude_level

A vector of any values that should be treated as NA in the response matrix. Unlike the miss_val parameter, these values will be dropped from the data before estimation rather than modeled explicitly.

simulation

If TRUE, simulated values are saved in the idealdata object for later plotting with the id_plot_sims function

Value

A idealdata object that can then be used in the id_estimate function to fit a model.

Details

This function can accept either a rollcall data object from package pscl or a long data frame where one row equals one item-person (bill-legislator) observation with associated outcome. The preferred method is the long data frame as passing a long data frame permits the inclusion of a wide range of covariates in the model, such as person-varying and item-varying (bill-varying) covariates. If a rollcall object is passed to the function, the rollcall data is converted to a long data frame with data from the vote.data matrix used to determine dates for bills. If passing a long data frame, you should specify the names of the columns containing the IDs for persons, items and groups (groups are IDs that may have multiple observations per ID, such as political parties or classes) to the id_make function, along with the name of the response/outcome. The only required columns are the item/bill ID and the person/legislator ID along with an outcome column.

The preferred format for the outcome column for discrete variables (binary or ordinal) is to pass a factor variable with levels in the correct order, i.e., in ascending order. For example, if using legislative data, the levels of the factor should be c('No','Yes'). If a different kind of variable is passed, such as a character or numeric variable, you should consider specifying low_val,high_val and middle_val to determine the correct order of the discrete outcome. Specifying middle_val is only necessary if you are estimating an ordinal model.

If you do not specify a value for miss_val, then any NA are assumed to be missing. If you do specify miss_val and you also have NA in your data (assuming miss_val is not NA), then the function will treat the data coded as miss_val as missing data that should be modeled and will treat the NA data as ignorable missing data that will be removed (list-wise deletion) before estimating a model.

To run a time-varying model, you need to include the name of a column with dates (or integers) that is passed to the time_id option.

If the outcome is unbounded i.e. a continuous or an unbounded discrete variable like Poisson, simply set unbounded to TRUE. You can ignore the options that specify which values should be high_val or low_val. You can either specify a particular value as missing using miss_val, or all missing values (NA) will be recoded to a specific value out of the range of the outcome to use for modeling the missingness.

Examples

Run this code

# NOT RUN {
# You can either use a pscl rollcall object or a vote/score matrix 
# where persons/legislators are in the rows
# and items/bills are in the columns

library(dplyr)

# First, using a rollcall object with the 114th Senate's rollcall votes:

data('senate114')

to_idealstan <-   id_make(score_data = senate114,
               outcome = 'cast_code',
               person_id = 'bioname',
               item_id = 'rollnumber',
               group_id= 'party_code',
               time_id='date',
               high_val='Yes',
               low_val='No',
               miss_val='Absent')

# }

Run the code above in your browser using DataLab