catIrt: Simulate Computerized Adaptive Tests (CATs)

Description

catIrt simulates Computerized Adaptive Tests (CATs) given a vector/matrix of responses or a vector of ability values, a matrix of item parameters, and several item selection mechanisms, estimation procedures, and termination criteria.

Usage

catIrt( params, mod = c("brm", "grm"),
        resp = NULL,
        theta = NULL,
        catStart = list( n.start = 5, init.theta = 0,
                         select = c("UW-FI", "LW-FI", "PW-FI",
                                    "FP-KL", "VP-KL", "FI-KL", "VI-KL",
                                    "random"),
                         at = c("theta", "bounds"),
                         it.range = NULL, n.select = 1,
                         delta = .1,
                         score = c("fixed", "step", "random", "WLE", "BME", "EAP"),
                         range = c(-1, 1),
                         step.size = 3, leave.after.MLE = FALSE ),
        catMiddle = list( select = c("UW-FI", "LW-FI", "PW-FI",
                                     "FP-KL", "VP-KL", "FI-KL", "VI-KL",
                                     "random"),
                          at = c("theta", "bounds"),
                          it.range = NULL, n.select = 1,
                          delta = .1,
                          score = c("MLE", "WLE", "BME", "EAP"),
                          range = c(-6, 6),
                          expos = c("none", "SH") ),
        catTerm = list( term = c("fixed", "precision", "info", "class"),
                        score = c("MLE", "WLE", "BME", "EAP"),
                        n.min = 5, n.max = 50,
                        p.term = list(method = c("threshold", "change"),
                                      crit = .25),
                        i.term = list(method = c("threshold", "change"),
                                      crit = 2), 
                        c.term = list(method = c("SPRT", "GLR", "CI"),
                                      bounds = c(-1, 1),
                                      categ = c(0, 1, 2),
                                      delta = .1,
                                      alpha = .05, beta = .05,
                                      conf.lev = .95) ),
        ddist = dnorm,
        progress = TRUE, ...)
## S3 method for class 'catIrt':
summary( object, group = TRUE, ids = "none", \dots )
## S3 method for class 'catIrt':
plot( x, which = "all", ids = "none", 
      conf.lev = .95, legend = TRUE, ask = TRUE, \dots )

Arguments

object, x

a catIrt object.

params

numeric: a matrix of item parameters. If specified as a matrix, the rows must index the items, and the columns must designate the item parameters. For the binary response model, params must eithe

mod

character: a character string indicating the IRT model. Current support is for the 3-parameter binary response model (), and Samejima's graded response model (). The contents

resp

numeric: either a $N \times J$ matrix (where $N$ indicates the number of simulees and $J$ indicates the number of items), a $J$ length vector (if there is only one simulee), or NULL if specifying thetas

theta

numeric: either a $N$-dimensional vector (where $N$ indicates the number of simulees) or NULL if specifying resp.

catStart

list: a list of options for starting the CAT including:

n.start: a scalar indicating the number of items that are used for each simulee at the beginning of the CAT. Afterreaches the speci

catMiddle

list: a list of options for selecting/scoring during the middle of the CAT, including:

select: a character string indicating the item selection method for the remaining items. Seeselect

catTerm

list: a list of options for stopping/terminating the CAT, including:

term: a scalar/vector indicating the termination criterion/criteria. CATs can be terminated either through a fixed number of items (

ddist

function: a function indicating how to calculate prior densities for Bayesian estimation or particular item selection methods. For instance, if you wish to specify a normal prior, ddist = dnorm, and if yo

which

numeric: a scalar or vector of integers between 1 and 4, indicating which plots to include. The plots are as follows:

Bank Information
Bank SEM
CAT Information
CAT SEM

group

logical: TRUE or FALSE indicating whether to display a summary at the group level.

ids

numeric: a scalar or vector of integers between 1 and the number of simulees indicating which simulees to plot and/or summarize their CAT process and all of their $\theta$ estimates. ids can

conf.lev

numeric: a scalar between 0 and 1 indicating the desired confidence level plotted for the individual $\theta$ estimates.

legend

logical: TRUE or FALSE indicating whether the plot function should display a legend on the plot.

ask

logical: TRUE or FALSE indicating whether the plot function should ask between plots.

progress

logical: TRUE or FALSE indicating whether the catIrt function should display a progress bar during the CAT.

...

arguments passed to ddist or plot.catIrt, usually distribution parameters identified by name or graphical parameters.

Value

The function catIrt returns a list (of class "catIrt") with the following elements:
cat_thetaa vector of final CAT $\theta$ estimates.
cat_catega vector indicating the final classification of each simulee in the CAT. If term is not , cat_categ will be a vector of NA values.
cat_infoa vector of observed Fisher information based on the final CAT $\theta$ estimates and the item responses.
cat_sema vector of observed SEM estimates (or posterior standard deviations) based on the final CAT $\theta$ estimates and the item responses.
cat_lengtha vector indicating the number of items administered to each simulee in the CAT
cat_terma vector indicating how each CAT was terminated.
tot_thetaa vector of $\theta$ estimates given the entire item bank.
tot_catega vector indicating the classification of each simulee given the entire item bank.
tot_infoa vector of observed Fisher information based on the entire item bank worth of responses.
tot_sema vector of observed SEM estimates based on the entire item bank worth of responses.
true_thetaa vector of true $\theta$ values if specified by the user.
true_catega vector of true classification given $\theta$.
full_paramsthe full item bank.
full_respthe full set of responses.
cat_indiva list of $\theta$ estimates, observed SEM, observed information, the responses and the parameters chosen for each simulee over the entire CAT.
moda list of model specifications, as designated by the user, so that the CAT can be easily reproduced.

Details

The function catIrt performs a post-hoc computerized adaptive test (CAT), with a variety of user specified inputs. For a given person/simulee (e.g. simulee $i$), a CAT represents a simple set of stages surrounded by a while loop (e.g. Weiss and Kingsbury, 1984):

Item Selection: The next item is chosen based on a pre-specified criterion/criteria. For example, the classic item selection mechanism is picking an item such that it maximizes Fisher Information at the current estimate of$\theta_i$. Frequently, content balancing, item constraints, or item exposure will be taken into consideration at this point (aside from solely picking the "best item" for a given person). SeeitChoosefor current item selection methods.
Estimation:$\theta_i$is estimated based on updated information, usually relating to the just-selected item and the response associated with that item. In a post-hoc CAT, all of the responses already exist, but in a standard CAT, "item administration" would be between "item selection" and "estimation." The classic estimation mechanism is estimating$\theta_i$based off of maximizing the likelihood given parameters and a set of responses. Other estimation mechanisms correct for bias in the maximum likelihood estimate or add a prior information (such as a prior distribution of$\theta$). If an estimate is untenable (i.e. it returns a non-sensical value or$\infty$), the estimation procedure needs to have an alternative estimation mechanism. SeemleEstfor current estimation methods.
Termination: Either the test is terminated based on a pre-specified criterion/critera, or no termination criteria is satisfied, in which case the loop repeats. The standard termination criteria involve a fixed criterion (e.g. administering only 50 items), or a variable criterion (e.g. continuing until the observed SEM is below .3). Other termination criteria relate to cut-point tests (e.g. certification tests, classification tests), that depend not solely on ability but on whether that ability is estimated to exceed a threshold.catIrtterminates classification tests based on either the Sequential Probability Ratio Test (SPRT) (see Eggen, 1999), the Generalized Likelihood Ratio (GLR) (see Thompson, 2009), or the Confidence Interval Method (see Kingsbury & Weiss, 1983). Essentially, the SPRT compares the ratio of two likelihoods (e.g. the likelihood of the data given being in one category vs the likelihood of the data given being in the other category, as defined by$B + \delta$and$B - \delta$(where$B$separates the categories and$\delta$is the halfwidth of the indifference region) and compares that ratio with a ratio of error rates ($\alpha$and$\beta$) (see Wald, 1945). The GLR uses the maximum likelihood estimate in place of either$B + \delta$or$B - \delta$, and the confidence interval method terminates a CAT if the confidence interval surrounding an estimate of$\theta$is fully within one of the categories.

The CAT estimates $\theta_{i1}$ (an initial point) based on init.theta, and terminates the entire simulation after sequentially terminating each simulee's CAT.

References

Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23, 249 -- 261.

Kingsbury, G. G., & Weiss (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 257--283). New York, NY: Academic Press.

Thompson, N. A. (2009). Using the generalized likelihood ratio as a termination criterion. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC conference on computerized adaptive testing.

Wainer, H. (Ed.). (2000). Computerized Adaptive Testing: A Primer (2nd Edition). Mahwah, NJ: ELawrence Erlbaum Associates.

Wald, A. (1945). Sequential tests of statistical hypotheses. Annals of Mathematical Statistics, 16, 117 -- 186.

Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361-375.

Examples

Run this code

#########################
# Binary Response Model #
#########################
set.seed(888)
# generating random theta:
theta <- rnorm(50)
# generating an item bank under a 2-parameter binary response model:
b.params <- cbind(a = runif(100, .5, 1.5), b = rnorm(100, 0, 2), c = 0)
# simulating responses:
b.resp <- simIrt(theta = theta, params = b.params, mod = "brm")$resp


## CAT 1 ##
# the typical, classic post-hoc CAT:
catStart1 <- list(init.theta = 0, n.start = 5,
                  select = "UW-FI", at = "theta",
                  n.select = 4, it.range = c(-1, 1),
                  score = "step", range = c(-1, 1),
                  step.size = 3, leave.after.MLE = FALSE)
catMiddle1 <- list(select = "UW-FI", at = "theta",
                   n.select = 1, it.range = NULL,
                   score = "MLE", range = c(-6, 6),
                   expos = "none")
catTerm1 <- list(term = "fixed", n.min = 10, n.max = 50)

cat1 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle1,
               catTerm = catTerm1)

# we can print, summarize, and plot:
cat1                                        # prints theta because
                                            # we have fewer than
                                            # 200 simulees
summary(cat1, group = TRUE, ids = "none")   # nice summary!

summary(cat1, group = FALSE, ids = 1:4)     # summarizing people too! :)

par(mfrow = c(2, 2))
plot(cat1, ask = FALSE)               # 2-parameter model, so expected FI
                                      # and observed FI are the same
par(mfrow = c(1, 1))

# we can also plot particular simulees:
par(mfrow = c(2, 1))
plot(cat1, which = "none", ids = c(1, 30), ask = FALSE)
par(mfrow = c(1, 1))


## CAT 2 ##
# using Fixed Point KL info rather than Unweighted FI to select items:
catStart2 <- catStart1
catMiddle2 <- catMiddle1
catTerm2 <- catTerm1

catStart2$leave.after.MLE <- TRUE         # leave after mixed response pattern
catMiddle2$select <- "FP-KL"
catMiddle2$at <- "bounds"
catMiddle2$delta <- .2
catTerm2$c.term <- list(bounds = 0)
cat2 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart2,
               catMiddle = catMiddle2,
               catTerm = catTerm2)
cor(cat1$cat_theta, cat2$cat_theta)       # very close!

summary(cat2, group = FALSE, ids = 1:4)   # rarely 5 starting items!


## CAT 3/4 ##
# using "precision" rather than "fixed" to terminate:
catTerm1$term <- catTerm2$term <- "precision"
catTerm1$p.term <- catTerm2$p.term <- list(method = "threshold", crit = .3)
cat3 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle1,
               catTerm = catTerm1)
cat4 <- catIrt(params = b.params, mod = "brm",
			   resp = b.resp,
			   catStart = catStart2,
			   catMiddle = catMiddle2,
			   catTerm = catTerm2)

mean(cat3$cat_length - cat4$cat_length) # KL info results in slightly more items


## CAT 5/6 ##
# classification CAT with a boundary of 0 (with default classification stuff):
catTerm5 <- list(term = "class", n.min = 10, n.max = 50,
                 c.term = list(method = "SPRT",
                               bounds = 0, delta = .2,
                               alpha = .10, beta = .10))
cat5 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle1,
               catTerm = catTerm5)
cat6 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle2,
               catTerm = catTerm5)

# how many were classified correctly?
mean(cat5$cat_categ == cat5$tot_categ)

# using a different selection mechanism, we get the similar results:
mean(cat6$cat_categ == cat6$tot_categ)


## CAT 7 ##
# we could change estimation to EAP with the default (normal) prior:
catMiddle7 <- catMiddle1
catMiddle7$score <- "EAP"
cat7 <- catIrt(params = b.params, mod = "brm", # much slower!
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle7,
               catTerm = catTerm1)
cor(cat1$cat_theta, cat7$cat_theta)            # pretty much the same


## CAT 8 ##
# let's specify the prior as something strange:
cat8 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle7,
               catTerm = catTerm1,
               ddist = dchisq, df = 4)

cat8   # all positive values of "theta"


## CAT 9 ##
# finally, we can have:
#   - more than one termination criteria,
#   - individual bounds per person,
#   - simulating based on theta without a response matrix.
catTerm9 <- list(term = c("fixed", "class"),
                 n.min = 10, n.max = 50,
                 c.term = list(method = "SPRT",
                               bounds = cbind(runif(length(theta), -1, 0),
                                              runif(length(theta), 0, 1)),
                               delta = .2,
                               alpha = .1, beta = .1))
cat9 <- catIrt(params = b.params, mod = "brm",
               resp = NULL, theta = theta,
               catStart = catStart1,
               catMiddle = catMiddle1,
               catTerm = catTerm9)

summary(cat9)   # see "... with Each Termination Criterion"


#########################
# Graded Response Model #
#########################
# generating random theta
theta <- rnorm(201)
# generating an item bank under a graded response model:
g.params <- cbind(a = runif(100, .5, 1.5), b1 = rnorm(100), b2 = rnorm(100),
                                           b3 = rnorm(100), b4 = rnorm(100))

# the graded response model is exactly the same, only slower!
cat10 <- catIrt(params = g.params, mod = "grm",
                resp = NULL, theta = theta,
                catStart = catStart1,
                catMiddle = catMiddle1,
                catTerm = catTerm1)

# warning because it.range cannot be specified for graded response models!

# if there is more than 200 simulees, it doesn't print individual thetas:
cat10

# play around with things - CATs are fun - a little frisky, but fun.

Run the code above in your browser using DataLab