To run an IRT model using idealstan
, you must first process your data using the id_make
function.
id_make(score_data = NULL, outcome = "outcome",
person_id = "person_id", item_id = "item_id", time_id = "time_id",
group_id = "group_id", simul_data = NULL, person_cov = NULL,
group_cov = NULL, item_cov = NULL, item_cov_miss = NULL,
miss_val = NA, high_val = NULL, low_val = NULL,
middle_val = NULL, unbounded = FALSE, exclude_level = NA,
simulation = FALSE)
A data frame in long form, i.e., one row in the data for each
measured score or vote in the data or a rollcall
data object from package pscl
.
Column name of the outcome in score_data
, default is "outcome"
Column name of the person/legislator ID index in score_data
,
default is 'person_id'
. Should be integer, character or factor.
Column name of the item/bill ID index in score_data
,
default is 'item_id'
. Should be integer, character or factor.
Column name of the time values in score_data
:
optional, default is 'time_id'
. Should be a date or date-time class, but can be an integer
(i.e., years in whole numbers).
Optional column name of a person/legislator group IDs (i.e., parties) in score_data
.
Optional, default is 'group_id'
. Should be integer, character or factor.
Optionally, data that has been generated by the id_sim_gen
function.
A one-sided formula that specifies the covariates
in score_data
that will be used to hierarchically model the person/legislator ideal points
A one-sided formula that specifies the covariates
in score_data
that will be used to hierarchically model the person/legislator ideal points
at the group level. Use this in place of person_cov
if you intend to run a group-level model.
A one-sided formula that specifies the covariates
in score_data
that will be used to hierarchically model the
item/bill discrimination parameters for the regular model
A one-sided formula that specifies the covariates in the dataset that will be used to hierarchically model the item/bill discrimination parameters for the missing data model.
The value (numeric or character) that indicate missing data/absences in the data.
If missing data is coded as NA
,
simply leave this parameter at the default, NA
.
The value (numeric or character) that indicate the highest discrete outcome possible, such as yes in a vote dataset or correct in a test examination.
The value (numeric or character) that indicates the lowest discrete outcome possible, such as no votes in a vote dataset or incorrect in a test examination.
The value (numeric or character) that indicate values between the lowest and highest categories, such as abstention in voting data or "Neither Agree nor Disagree" in Likert scales. If there are multiple possible values, pass along a numeric or character vector of all such values in correct order (lower to higher values). If there are no middle values (binary outcome), leave empty.
Whether or not the outcome/response is unbounded (i.e., continuous or
Poisson). If it is, miss_val
is recoded as the maximum of the outcome + 1.
A vector of any values that should be treated as NA
in the response matrix.
Unlike the miss_val
parameter, these values will be dropped from the data before
estimation rather than modeled explicitly.
If TRUE
, simulated values are saved in the idealdata
object for
later plotting with the id_plot_sims
function
A idealdata
object that can then be used in the id_estimate
function
to fit a model.
This function can accept either a rollcall
data object from package
pscl
or a long data frame where one row equals one item-person (bill-legislator)
observation with associated outcome. The preferred method is the long data frame
as passing a long data frame permits
the inclusion of a wide range of covariates in the model, such as person-varying and item-varying
(bill-varying) covariates.
If a rollcall
object is passed to the function, the rollcall
data is converted
to a long data frame with data from the vote.data
matrix used to determine dates for bills.
If passing a long data frame, you should specify the names of the
columns containing the IDs for persons, items and
groups (groups are IDs that may have multiple observations per ID, such as political parties or
classes) to the id_make
function, along with the name of the response/outcome.
The only required columns are the item/bill ID and the person/legislator ID along with an
outcome column.
The preferred format for the outcome column for discrete variables (binary or ordinal)
is to pass a factor variable with levels in the correct order, i.e., in ascending order.
For example, if using legislative data, the levels of the factor should be c('No','Yes')
.
If a different kind of variable is passed, such as a character or numeric variable,
you should consider specifying low_val
,high_val
and middle_val
to
determine the correct order of the discrete outcome. Specifying middle_val
is only
necessary if you are estimating an ordinal model.
If you do not specify a value for miss_val
, then any NA
are assumed to be
missing. If you do specify miss_val
and you also have NA
in your data
(assuming miss_val
is not NA
), then the function will treat the data
coded as miss_val
as missing data that should be modeled and will treat the NA
data as ignorable missing data that will be removed (list-wise deletion) before estimating a
model.
To run a time-varying model, you need to include the name of a column with dates (or integers) that is passed
to the time_id
option.
If the outcome is unbounded i.e. a continuous or an unbounded
discrete variable like Poisson, simply set unbounded
to TRUE
. You can ignore the
options that specify which values should be high_val
or low_val
. You can either specify
a particular value as missing using miss_val
, or all
missing values (NA
) will be recoded to a specific value out of the range of the outcome to use
for modeling the missingness.
# NOT RUN {
# You can either use a pscl rollcall object or a vote/score matrix
# where persons/legislators are in the rows
# and items/bills are in the columns
library(dplyr)
# First, using a rollcall object with the 114th Senate's rollcall votes:
data('senate114')
to_idealstan <- id_make(score_data = senate114,
outcome = 'cast_code',
person_id = 'bioname',
item_id = 'rollnumber',
group_id= 'party_code',
time_id='date',
high_val='Yes',
low_val='No',
miss_val='Absent')
# }
Run the code above in your browser using DataLab