sim: Simulate data

Description

Simulate data using item response theory (IRT) models.

Usage

sim(psi, xi)

Value

A list is returned. Possible elements include:

x: A matrix of item scores.
d: A matrix of item distractors.
r: A matrix of item responses.
y: A matrix of item log response times.

Arguments

psi: A matrix of item parameters.
xi: A matrix of person parameters.

Models for Item Scores

The Rasch, 2PL, and 3PL models (Birnbaum, 1968; Rasch, 1960) are given by $$P(X_{vi} = 1 | \theta_v, a_i, b_i, c_i) = c_i + \frac{1 - c_i}{1 + \exp\{-a_i(\theta_v - b_i)\}}.$$

psi must contain columns named "a", "b", and "c" for the item discrimination, difficulty, and pseudo-guessing parameters, respectively.
xi must contain a column named "theta" for the person ability parameters.

The partial credit model (PCM; Masters, 1982) and the generalized partial credit model (GPCM; Muraki, 1992) are given by $$P(X_{vi} = j | \theta_v, a_i, \boldsymbol{c_i}) = \frac{\exp\{\sum_{k=0}^j a_i(\theta_v - c_{ik})\}} {\sum_{l=0}^{m_i} \exp\{\sum_{k=0}^l a_i(\theta_v - c_{ik})\}}.$$

psi must contain columns named "a" for the item discrimination parameter and "c0", "c1", ..., for the item category parameters.
xi must contain a column named "theta" for the person ability parameters.

The graded response model (GRM; Samejima, 1969) is given by $$P(X_{vi} = j | \theta_v, a_i, \boldsymbol{b_i}) = P(X_{vi} \ge j | \theta_v, a_i, \boldsymbol{b_i}) - P(X_{vi} \ge j + 1 | \theta_v, a_i, \boldsymbol{b_i}),$$ where $$P(X_{vi} \ge j | \theta_v, a_i, \boldsymbol{b_i}) = \begin{cases} 1 &\text{if } j = 0, \\ \frac{1}{1 + \exp\{-a_i(\theta_v - b_{ij})\}} &\text{if } 1 \le j \le m_i, \\ 0 &\text{if } j = m_i + 1. \end{cases}$$

psi must contain columns named "a" for the item discrimination parameter and "b1", "b2", ..., for the item location parameters listed in increasing order.
xi must contain a column named "theta" for the person ability parameters.

Models for Item Distractors

The nested logit model (NLM; Bolt et al., 2012) is given by $$P(D_{vi} = j | \theta_v, \eta_v, a_i, b_i, c_i, \boldsymbol{\lambda_i}, \boldsymbol{\zeta_i}) = [1 - P(X_{vi} = 1 | \theta_v, a_i, b_i, c_i)] \times P(D_{vi} = j | X_{vi} = 0, \eta_v, \boldsymbol{\lambda_i}, \boldsymbol{\zeta_i}),$$ where $$P(D_{vi} = j | X_{vi} = 0, \eta_v, \boldsymbol{\lambda_i}, \boldsymbol{\zeta_i}) = \frac{\exp(\lambda_{ij} \eta_v + \zeta_{ij})} {\sum_{k=1}^{m_i-1} \exp(\lambda_{ik} \eta_v + \zeta_{ik})}.$$

psi must contain columns named "a", "b", and "c" for the item discrimination, difficulty, and pseudo-guessing parameters, respectively, "lambda1", "lambda2", ..., for the item slope parameters, and "zeta1", "zeta2", ..., for the item intercept parameters.
xi must contain columns named "theta" and "eta" for the person parameters that govern response correctness and distractor selection, respectively.

Models for Item Responses

The nominal response model (NRM; Bock, 1972) is given by $$P(R_{vi} = j | \eta_v, \boldsymbol{\lambda_i}, \boldsymbol{\zeta_i}) = \frac{\exp(\lambda_{ij} \eta_v + \zeta_{ij})} {\sum_{k=1}^{m_i} \exp(\lambda_{ik} \eta_v + \zeta_{ik})}.$$

psi must contain columns named "lambda1", "lambda2", ..., for the item slope parameters and "zeta1", "zeta2", ..., for the item intercept parameters. If there is a correct response category, its parameters should be listed last.
xi must contain a column named "eta" for the person parameters that govern response selection.

Models for Item Log Response Times

The lognormal model (van der Linden, 2006) is given by $$f(Y_{vi} | \tau_v, \alpha_i, \beta_i) = \frac{\alpha_i}{\sqrt{2 \pi}} \exp\{-\frac{1}{2}[\alpha_i(Y_{vi} - (\beta_i - \tau_v))]^2\}.$$

psi must contain columns named "alpha" and "beta" for the item time discrimination and time intensity parameters, respectively.
xi must contain a column named "tau" for the person speed parameters.

References

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397--479). Addison-Wesley.

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29--51.

Bolt, D. M., Wollack, J. A., & Suh, Y. (2012). Application of a multidimensional nested logit model to multiple-choice test items. Psychometrika, 77(2), 339--357.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149--174.

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159--176.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika, 34(S1), 1--97.

van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31(2), 181--204.

Examples

Run this code

# Setup for Examples 1 to 5 -------------------------------------------------

# Settings
set.seed(0)     # seed for reproducibility
N <- 500        # number of persons
n <- 40         # number of items

# Example 1: 3PL Model and Lognormal Model ----------------------------------

# Generate person parameters
xi <- MASS::mvrnorm(
  N,
  mu = c(theta = 0.00, tau = 0.00),
  Sigma = matrix(c(1.00, 0.25, 0.25, 0.25), ncol = 2)
)

# Generate item parameters
psi <- cbind(
  a = rlnorm(n, meanlog = 0.00, sdlog = 0.25),
  b = NA,
  c = runif(n, min = 0.05, max = 0.30),
  alpha = runif(n, min = 1.50, max = 2.50),
  beta = NA
)

# Generate positively correlated difficulty and time intensity parameters
psi[, c("b", "beta")] <- MASS::mvrnorm(
  n,
  mu = c(b = 0.00, beta = 3.50),
  Sigma = matrix(c(1.00, 0.20, 0.20, 0.15), ncol = 2)
)

# Simulate item scores and log response times
dat <- sim(psi, xi)
x <- dat$x
y <- dat$y

# Example 2: Generalized Partial Credit Model -------------------------------

# Generate person parameters
xi <- cbind(theta = rnorm(N, mean = 0.00, sd = 1.00))

# Generate item parameters
psi <- cbind(
  a = rlnorm(n, meanlog = 0.00, sdlog = 0.25),
  c0 = 0,
  c1 = rnorm(n, mean = -1.00, sd = 0.50),
  c2 = rnorm(n, mean = 0.00, sd = 0.50),
  c3 = rnorm(n, mean = 1.00, sd = 0.50)
)

# Simulate item scores
x <- sim(psi, xi)$x

# Example 3: Graded Response Model ------------------------------------------

# Generate person parameters
xi <- cbind(theta = rnorm(N, mean = 0.00, sd = 1.00))

# Generate item parameters
psi <- cbind(
  a = rlnorm(n, meanlog = 0.00, sdlog = 0.25),
  b1 = rnorm(n, mean = -1.00, sd = 0.50),
  b2 = rnorm(n, mean = 0.00, sd = 0.50),
  b3 = rnorm(n, mean = 1.00, sd = 0.50)
)

# Sort item location parameters in increasing order
psi[, paste0("b", 1:3)] <- t(apply(psi[, paste0("b", 1:3)], 1, sort))

# Simulate item scores
x <- sim(psi, xi)$x

# Example 4: Nested Logit Model ---------------------------------------------

# Generate person parameters
xi <- MASS::mvrnorm(
  N,
  mu = c(theta = 0.00, eta = 0.00),
  Sigma = matrix(c(1.00, 0.80, 0.80, 1.00), ncol = 2)
)

# Generate item parameters
psi <- cbind(
  a = rlnorm(n, meanlog = 0.00, sdlog = 0.25),
  b = rnorm(n, mean = 0.00, sd = 1.00),
  c = runif(n, min = 0.05, max = 0.30),
  lambda1 = rnorm(n, mean = 0.00, sd = 1.00),
  lambda2 = rnorm(n, mean = 0.00, sd = 1.00),
  lambda3 = rnorm(n, mean = 0.00, sd = 1.00),
  zeta1 = rnorm(n, mean = 0.00, sd = 1.00),
  zeta2 = rnorm(n, mean = 0.00, sd = 1.00),
  zeta3 = rnorm(n, mean = 0.00, sd = 1.00)
)

# Simulate item scores and distractors
dat <- sim(psi, xi)
x <- dat$x
d <- dat$d

# Example 5: Nominal Response Model -----------------------------------------

# Generate person parameters
xi <- cbind(eta = rnorm(N, mean = 0.00, sd = 1.00))

# Generate item parameters
psi <- cbind(
  lambda1 = rnorm(n, mean = -0.50, sd = 0.50),
  lambda2 = rnorm(n, mean = -0.50, sd = 0.50),
  lambda3 = rnorm(n, mean = -0.50, sd = 0.50),
  lambda4 = rnorm(n, mean = 1.50, sd = 0.50),
  zeta1 = rnorm(n, mean = -0.50, sd = 0.50),
  zeta2 = rnorm(n, mean = -0.50, sd = 0.50),
  zeta3 = rnorm(n, mean = -0.50, sd = 0.50),
  zeta4 = rnorm(n, mean = 1.50, sd = 0.50)
)

# Simulate item responses
r <- sim(psi, xi)$r

Run the code above in your browser using DataLab