There are two ways of generating the simulated response data.
The first way is by using the argument x
to read in a data frame of item metadata. In the data frame, the first column should have item IDs,
the second column should contain unique score category numbers of the items, and the third column should include IRT models being fit to the items.
The available IRT models are "1PLM", "2PLM", "3PLM", and "DRM" for dichotomous item data, and "GRM" and "GPCM" for polytomous item data.
Note that "DRM" covers all dichotomous IRT models (i.e, "1PLM", "2PLM", and "3PLM") and "GRM" and "GPCM" represent the graded
response model and (generalized) partial credit model, respectively. The next columns should include the item parameters of the fitted IRT models.
For dichotomous items, the fourth, fifth, and sixth columns represent the item discrimination (or slope), item difficulty, and
item guessing parameters, respectively. When "1PLM" and "2PLM" are specified in the third column, NAs should be inserted in the sixth column
for the item guessing parameters. For polytomous items, the item discrimination (or slope) parameters should be included in the
fourth column and the item difficulty (or threshold) parameters of category boundaries should be contained from the fifth to the last columns.
When the number of unique score categories differs between items, the empty cells of item parameters should be filled with NAs.
In the irtplay package, the item difficulty (or threshold) parameters of category boundaries for GPCM are expressed as
the item location (or overall difficulty) parameter subtracted by the threshold parameter for unique score categories of the item.
Note that when an GPCM item has K unique score categories, K-1 item difficulty parameters are necessary because
the item difficulty parameter for the first category boundary is always 0. For example, if an GPCM item has five score categories,
four item difficulty parameters should be specified. An example of a data frame with a single-format test is as follows:
ITEM1 | 2 | 1PLM | 1.000 | 1.461 | NA | ITEM2 | 2 |
2PLM | 1.921 | -1.049 | NA | ITEM3 | 2 | 3PLM | 1.736 |
1.501 | 0.203 | ITEM4 | 2 | 3PLM | 0.835 | -1.049 | 0.182 |
And an example of a data frame for a mixed-format test is as follows:
ITEM1 | 2 | 1PLM | 1.000 | 1.461 | NA | NA | NA |
ITEM2 | 2 | 2PLM | 1.921 | -1.049 | NA | NA | NA |
ITEM3 | 2 | 3PLM | 0.926 | 0.394 | 0.099 | NA | NA |
ITEM4 | 2 | DRM | 1.052 | -0.407 | 0.201 | NA | NA |
ITEM5 | 4 | GRM | 1.913 | -1.869 | -1.238 | -0.714 | NA |
ITEM6 | 5 | GRM | 1.278 | -0.724 | -0.068 | 0.568 | 1.072 |
ITEM7 | 4 | GPCM | 1.137 | -0.374 | 0.215 | 0.848 | NA |
ITEM8 | 5 | GPCM | 1.233 | -2.078 | -1.347 | -0.705 | -0.116 |
See IRT Models
section in the page of irtplay-package
for more details about the IRT models used in the irtplay package.
An easier way to create a data frame for the argument x
is by using the function shape_df
.
The second way is by directly specifying item parameters for each item for which response data should be simulated
(i.e., without using a data frame, as shown in the examples that follow). In addition to item parameters,
theta
, cats
, pmodel
, and D
should be specified as well. g.dc
does not need to be specified when only
the 1PL and 2PL models are used for dichotomous item response data. For dichotomous items, 2s should be specified in cats
.
For polytomous items, the number of unique score categories should be specified in cats
. When a response data set is generated with
a mixed-format test, it is important to clearly specify cats
according to the order of items in the test form. Suppose that the response
data of ten examinees are simulated with five items, including three dichotomous items and two polytomous items with three categories.
Also, suppose that the second and the forth items are the polytomous items. Then, cats = c(2, 3, 2, 3, 2)
should be used.
Additionally, among those two polytomous items, if the first and second item response data are simulated from the graded response model
and generalized partial credit model, respectively, then pmodel = c('GRM', 'GPCM')
.