simdat: Simulated Response Data

Description

This function generates a simulated response data for a single- or a mixed-format test forms. For dichotomous item response data, the IRT 1PL, 2PL, and 3PL models are available. For polytomous item response data, the graded response model, the partial credit model, and the generalized partial credit model are available.

Usage

simdat(
  x = NULL,
  theta,
  a.dc,
  b.dc,
  g.dc = NULL,
  a.py,
  d.py,
  cats,
  pmodel,
  D = 1
)

Value

This function returns a vector or a matrix. When a matrix is returned, rows indicate theta values and columns represent items.

Arguments

x: A data frame containing the item metadata (e.g., item parameters, number of categories, models ...). This data frame can be easily obtained using the function shape_df. See below for details.
theta: A vector of theta values.
a.dc: A vector of item discrimination (or slope) parameters for dichotomous IRT models.
b.dc: A vector of item difficulty (or threshold) parameters for dichotomous IRT models.
g.dc: A vector of item guessing parameters for dichotomous IRT models.
a.py: A vector of item discrimination (or slope) parameters for polytomous IRT models.
d.py: A list containing vectors of item threshold (or step) parameters for polytomous IRT models.
cats: A vector containing the number of score categories for items.
pmodel: A vector of character strings specifying the polytomous model with which response data are simulated. For each polytomous model, "GRM" for the graded response model or "GPCM" for the (generalized) partial credit model can be specified.
D: A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1.

Author

Hwanggyu Lim hglim83@gmail.com

Details

There are two ways of generating the simulated response data. The first way is by using the argument x to read in a data frame of item metadata. In the data frame, the first column should have item IDs, the second column should contain unique score category numbers of the items, and the third column should include IRT models being fit to the items. The available IRT models are "1PLM", "2PLM", "3PLM", and "DRM" for dichotomous item data, and "GRM" and "GPCM" for polytomous item data. Note that "DRM" covers all dichotomous IRT models (i.e, "1PLM", "2PLM", and "3PLM") and "GRM" and "GPCM" represent the graded response model and (generalized) partial credit model, respectively. The next columns should include the item parameters of the fitted IRT models. For dichotomous items, the fourth, fifth, and sixth columns represent the item discrimination (or slope), item difficulty, and item guessing parameters, respectively. When "1PLM" and "2PLM" are specified in the third column, NAs should be inserted in the sixth column for the item guessing parameters. For polytomous items, the item discrimination (or slope) parameters should be included in the fourth column and the item difficulty (or threshold) parameters of category boundaries should be contained from the fifth to the last columns. When the number of unique score categories differs between items, the empty cells of item parameters should be filled with NAs. In the irtplay package, the item difficulty (or threshold) parameters of category boundaries for GPCM are expressed as the item location (or overall difficulty) parameter subtracted by the threshold parameter for unique score categories of the item. Note that when an GPCM item has K unique score categories, K-1 item difficulty parameters are necessary because the item difficulty parameter for the first category boundary is always 0. For example, if an GPCM item has five score categories, four item difficulty parameters should be specified. An example of a data frame with a single-format test is as follows:

ITEM1	2	1PLM	1.000	1.461	NA	ITEM2	2
2PLM	1.921	-1.049	NA	ITEM3	2	3PLM	1.736
1.501	0.203	ITEM4	2	3PLM	0.835	-1.049	0.182

And an example of a data frame for a mixed-format test is as follows:

ITEM1	2	1PLM	1.000	1.461	NA	NA	NA
ITEM2	2	2PLM	1.921	-1.049	NA	NA	NA
ITEM3	2	3PLM	0.926	0.394	0.099	NA	NA
ITEM4	2	DRM	1.052	-0.407	0.201	NA	NA
ITEM5	4	GRM	1.913	-1.869	-1.238	-0.714	NA
ITEM6	5	GRM	1.278	-0.724	-0.068	0.568	1.072
ITEM7	4	GPCM	1.137	-0.374	0.215	0.848	NA
ITEM8	5	GPCM	1.233	-2.078	-1.347	-0.705	-0.116

See IRT Models section in the page of irtplay-package for more details about the IRT models used in the irtplay package. An easier way to create a data frame for the argument x is by using the function shape_df.

The second way is by directly specifying item parameters for each item for which response data should be simulated (i.e., without using a data frame, as shown in the examples that follow). In addition to item parameters, theta, cats, pmodel, and D should be specified as well. g.dc does not need to be specified when only the 1PL and 2PL models are used for dichotomous item response data. For dichotomous items, 2s should be specified in cats. For polytomous items, the number of unique score categories should be specified in cats. When a response data set is generated with a mixed-format test, it is important to clearly specify cats according to the order of items in the test form. Suppose that the response data of ten examinees are simulated with five items, including three dichotomous items and two polytomous items with three categories. Also, suppose that the second and the forth items are the polytomous items. Then, cats = c(2, 3, 2, 3, 2) should be used. Additionally, among those two polytomous items, if the first and second item response data are simulated from the graded response model and generalized partial credit model, respectively, then pmodel = c('GRM', 'GPCM').

Examples

Run this code

## example 1.
## simulates response data with a mixed-format test.
## for the first two polytomous items, the generalized partial credit model is used
## for the last polytomous item, the graded response model is used
# 100 examinees are sampled
theta <- rnorm(100)

# set item parameters for three dichotomous items with the 3PL model
a.dc <- c(1, 1.2, 1.3); b.dc <- c(-1, 0, 1); g.dc <- rep(0.2, 3)

# set item parameters for three polytomous item parameters
# note that 4, 4, and 5 categories are used for polytomous items
a.py <- c(1.3, 1.2, 1.7)
d.py <- list(c(-1.2, -0.3, 0.4), c(-0.2, 0.5, 1.6), c(-1.7, 0.2, 1.1, 2.0))

# create a numeric vector of score categoires for both dichotomous and polytomous item data
# this score category vector is used to specify the location of the polytomous items
cats <- c(2, 2, 4, 4, 5, 2)

# create a character vector of the IRT model for the polytomous items
pmodel <- c('GPCM', 'GPCM', 'GRM')

# simulate the response data
simdat(theta=theta, a.dc=a.dc, b.dc=b.dc, g.dc=NULL,
       a.py=a.py, d.py=d.py, cats=cats, pmodel=pmodel, D=1)


## example 2.
## simulates response data with a sigle-format test with the 2PL model.
# create a numeric vector of score categoires for the three 2PL model items
cats <- rep(2, 3)

# simulate the response data
simdat(theta=theta, a.dc=a.dc, b.dc=b.dc, cats=cats, D=1)

## example 3.
## the use of a "-prm.txt" file obtained from a flexMIRT
# import the "-prm.txt" output file from flexMIRT
flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtplay")

# read item parameters and transform them to item metadata
test_flex <- bring.flexmirt(file=flex_prm, "par")$Group1$full_df

# simulate the response data
simdat(x=test_flex, theta=theta, D=1) # use a data.farame of item meta information

Run the code above in your browser using DataLab