catIrt: Simulate Computerized Adaptive Tests (CATs)

Description

catIrt simulates Computerized Adaptive Tests (CATs) given a vector/matrix of responses or a vector of ability values, a matrix of item parameters, and several item selection mechanisms, estimation procedures, and termination criteria.

Usage

catIrt( params, mod = c("brm", "grm"),
        resp = NULL,
        theta = NULL,
        catStart = list( n.start = 5, init.theta = 0,
                         select = c("UW-FI", "LW-FI", "PW-FI",
                                    "FP-KL", "VP-KL", "FI-KL", "VI-KL",
                                    "random"),
                         at = c("theta", "bounds"),
                         it.range = NULL, n.select = 1,
                         delta = .1,
                         score = c("fixed", "step", "random", "WLE", "BME", "EAP"),
                         range = c(-1, 1),
                         step.size = 3, leave.after.MLE = FALSE ),
        catMiddle = list( select = c("UW-FI", "LW-FI", "PW-FI",
                                     "FP-KL", "VP-KL", "FI-KL", "VI-KL",
                                     "random"),
                          at = c("theta", "bounds"),
                          it.range = NULL, n.select = 1,
                          delta = .1,
                          score = c("MLE", "WLE", "BME", "EAP"),
                          range = c(-6, 6),
                          expos = c("none", "SH") ),
        catTerm = list( term = c("fixed", "precision", "info", "class"),
                        score = c("MLE", "WLE", "BME", "EAP"),
                        n.min = 5, n.max = 50,
                        p.term = list(method = c("threshold", "change"),
                                      crit = .25),
                        i.term = list(method = c("threshold", "change"),
                                      crit = 2), 
                        c.term = list(method = c("SPRT", "GLR", "CI"),
                                      bounds = c(-1, 1),
                                      categ = c(0, 1, 2),
                                      delta = .1,
                                      alpha = .05, beta = .05,
                                      conf.lev = .95) ),
        ddist = dnorm,
        progress = TRUE, … )
# S3 method for catIrt
summary( object, group = TRUE, ids = "none", … )
# S3 method for catIrt
plot( x, which = "all", ids = "none", 
      conf.lev = .95, legend = TRUE, ask = TRUE, … )

Arguments

object, x

a catIrt object.

params

numeric: a matrix of item parameters. If specified as a matrix, the rows must index the items, and the columns must designate the item parameters. For the binary response model, params must either be a 3-column matrix (if not using item exposure control), a 4-5-column matrix (with Sympson-Hetter parameters as the last column if using item exposure control), or a 4-5-column matrix (if including the item number as the first column). See Details for more information.

mod

character: a character string indicating the IRT model. Current support is for the 3-parameter binary response model ("brm"), and Samejima's graded response model ("grm"). The contents of params must match the designation of mod. If mod is left blank, it will be designated the class of resp (if resp inherits either "brm" or "grm"), and if that fails, it will ask the user (if in interactive mode) or error.

resp

numeric: either a $N \times J$ matrix (where $N$ indicates the number of simulees and $J$ indicates the number of items), a $J$ length vector (if there is only one simulee), or NULL if specifying thetas. For the binary response model ("brm"), resp must solely contain 0s and 1s. For the graded response model ("grm"), resp must solely contain integers $1, \dots, K$ , where $K$ is the number of categories, as indicated by the dimension of params.

theta

numeric: either a $N$ -dimensional vector (where $N$ indicates the number of simulees) or NULL if specifying resp.

catStart

list: a list of options for starting the CAT including:

n.start: a scalar indicating the number of items that are used for each simulee at the beginning of the CAT. After n.start reaches the specified value, the CAT will shift to the middle set of parameters.
init.theta: a scalar or vector of initial starting estimates of $θ$ . If init.theta is a scalar, every simulee will have the same starting value. Otherwise, simulees will have different starting values based on the respective element of init.theta.
select: a character string indicating the item selection method for the first few items. Items can be selected either through maximum Fisher information or Kullback-Leibler divergence methods or randomly. The Fisher information methods include
- "UW-FI": unweighted Fisher information at a point.
- "LW-FI": Fisher information weighted across the likelihood function.
- "PW-FI": Fisher information weighted across the posterior distribution of $θ$ .
And the Kullback-Leibler divergence methods include
- "FP-KL": pointwise KL divergence between [P +/- delta], where P is either the current $θ$ estimate or a classification bound.
- "VP-KL": pointwise KL divergence between [P +/- delta/sqrt(n)], where n is the number of items given to this point in the CAT.
- "FI-KL": KL divergence integrated along [P -/+ delta] with respect to P
- "VI-KL": KL divergence integrated along [P -/+ delta/sqrt(n)] with respect to P.
See itChoose for more information.
at: a character string indicating where to select items. If select is "UW-FI" and at is "theta", then items will be selected to maximize Fisher information at the proximate $θ$ estimates.
it.range: Either a 2-element numeric vector indicating the minimum and maximum allowed difficulty parameters for items selected during the starting portion of the CAT (only if mod is equal to "brm") or NULL indicating no item parameter restrictions. See itChoose for more information.
n.select: an integer indicating the number of items to select at one time. For instance, if select is "UW-FI", at is "theta", and n.select is 5, the item choosing function will randomly select between the top 5 items that maximize expected Fisher information at proximate $θ$ estimates.
delta: a scalar indicating the multiplier used in initial item selection if a Kullback-Leibler method is chosen.
score: a character string indicating the $θ$ estimation method. As of now, the options for scoring the first few items are "fixed" (at init.thet), "step" (by adding or subtracting step.size $θ$ estimates after each item), Weighted Likelihood Estimation ("WLE"), Bayesian Modal Estimation ("BME"), and Expected A-Posteriori Estimation ("EAP"). The latter two allow user specified prior distributions through density (d...) functions. See mleEst for more information.
range: a 2-element numeric vector indicating the minimum and maximum that $θ$ should be estimated in the starting portion of the CAT.
step.size: a scalar indicating how much to increment or decrement the estimate of $θ$ if score is set to "step".
leave.after.MLE: a logical indicating whether to skip the remainder of the starting items if the user has a mixed response pattern and/or a finite maximum likelihood estimate of $θ$ can be achieved.

catMiddle

list: a list of options for selecting/scoring during the middle of the CAT, including:

select: a character string indicating the item selection method for the remaining items. See select in catStart for an explanation of the options.
at: a character string indicating where to select items. See select in catStart for an explanation of the options.
it.range: Either a 2-element numeric vector indicating the minimum and maximum allowed difficulty parameters for items selected during the middle portion of the CAT (only if mod is equal to "brm") or NULL indicating no item parameter restrictions. See itChoose for more information.
n.select: an integer indicating the number of items to select at one time.
delta: a scalar indicating the multiplier used in middle item selection if a Kullback-Leibler method is chosen.
score: a character string indicating the $θ$ estimation method. As of now, the options for scoring the remaining items are Maximum Likelihood Estimation ("MLE"), Weighted Likelihood Estimation ("WLE"), Bayesian Modal Estimation ("BME"), and Expected A-Posteriori Estimation ("EAP"). The latter two allow user specified prior distributions through density (d...) functions. See mleEst for more information.
range: a 2-element numeric vector indicating the minimum and maximum that $θ$ should be estimated in the middle portion of the CAT.
expos: a character string indicating whether no item exposure controls should be implemented ("none") or whether the CAT should use Sympson-Hetter exposure controls ("SH"). If (and only if) expos is equal to "SH", the last column of the parameter matrix should indicate the probability of an item being administered given that it is selected.

catTerm

list: a list of options for stopping/terminating the CAT, including:

term: a scalar/vector indicating the termination criterion/criteria. CATs can be terminated either through a fixed number of items ("fixed") declared through the n.max argument; related to SEM of a simulee ("precision") declared through the p.term argument; related to the test information of a simulee at a particular point in the cat ("info") declared through the i.term argument; and/or when a simulee falls into a category. If more than one termination criteria is selected, the CAT will terminate after successfully satisfying the first of those for a given simulee.
score: a character string indicating the $θ$ estimation method for all of the responses in the bank. score is used to estimate $θ$ given the entire bank of item responses and parameter set. If the theta estimated using all of the responses is far away from $θ$ , the size of the item bank is probably too small. The options for score in catTerm are identical to the options of score in catMiddle.
n.min: an integer indicating the minimum number of items that a simulee should "take" before any of the termination criteria are checked.
n.max: an integer indicating the maximum number of items to administer before terminating the CAT.
p.term: a list indicating the parameters of a precision-based stopping rule, only if term is "precision", including:
1. method: a character string indicating whether to terminate the CAT when the SEM dips below a threshold ("threshold") or changes less than a particular amount ("change").
2. crit: a scalar indicating either the maximum SEM of a simulee before terminating the CAT or the maximum change in the simulee's SEM before terminating the CAT.
i.term: a list indicating the parameters of a information-based stopping rule, only if term is "info", including:
1. method: a character string indicating whether to terminate the CAT when FI exceeds a threshold ("threshold") or changes less than a particular amount ("change").
2. crit: a scalar indicating either the minimum FI of a simulee before terminating the CAT or the maximum change in the simulee's FI before terminating the CAT.
c.term: a list indicating the parameters of a classification CAT, only if term is "class" or any of the selection methods are at one or more "bounds", including:
1. method: a scalar indicating the method used for a classification CAT. As of now, the classification CAT options are the Sequential Probability Ratio Test ("SPRT"), the Generalized Likelihood Ratio ("GLR"), or the Confidence Interval method ("CI").
2. bounds: a scalar, vector, or matrix of classification bounds. If specified as a scalar, there will be one bound for each simulee at that value. If specified as a $N$ -dimensional vector, there will be one bound for each simulee. If specified as a $k < N$ -dimensional vector, there will be $k$ bounds for each simulee at those values. And if specified as a $N \times k$ -element matrix, there will be $k$ bounds for each simulee.
3. categ: a vector indicating the names of the categories into which the simulees should be classified. The length of categ should be one greater than the length of bounds.
4. delta: a scalar indicating the half-width of an indifference region when performing an SPRT-based classification CAT or selecting items by Kullback-Leibler divergence. See Eggen (1999) and KL for more information.
5. alpha: a scalar indicating the specified Type I error rate for performing an SPRT- based classification CAT.
6. beta: a scalar indicating the specified Type II error rate for performing an SPRT- based classification CAT.
7. conf.lev: a scalar between 0 and 1 indicating the confidence level used when performing a confidence-based ("CI") classification CAT.

ddist

function: a function indicating how to calculate prior densities for Bayesian estimation or particular item selection methods. For instance, if you wish to specify a normal prior, ddist = dnorm, and if you wish to specify a uniform prior, ddist = dunif. Note that it is standard in R to use d… to indicate a density. See itChoose for more information.

which

numeric: a scalar or vector of integers between 1 and 4, indicating which plots to include. The plots are as follows:

Bank Information
Bank SEM
CAT Information
CAT SEM

which can also be "none", in which case plot.catIrt will not plot any information functions, or it can be "all", in which case plot.catIrt will plot all four information functions.

group

logical: TRUE or FALSE indicating whether to display a summary at the group level.

ids

numeric: a scalar or vector of integers between 1 and the number of simulees indicating which simulees to plot and/or summarize their CAT process and all of their $θ$ estimates. ids can also be "none" (or, equivalently, NULL) or "all".

conf.lev

numeric: a scalar between 0 and 1 indicating the desired confidence level plotted for the individual $θ$ estimates.

legend

logical: TRUE or FALSE indicating whether the plot function should display a legend on the plot.

ask

logical: TRUE or FALSE indicating whether the plot function should ask between plots.

progress

logical: TRUE or FALSE indicating whether the catIrt function should display a progress bar during the CAT.

…

arguments passed to ddist or plot.catIrt, usually distribution parameters identified by name or graphical parameters.

Value

The function catIrt returns a list (of class "catIrt") with the following elements:

cat_theta

a vector of final CAT $θ$ estimates.

cat_categ

a vector indicating the final classification of each simulee in the CAT. If term is not "class", cat_categ will be a vector of NA values.

cat_info

a vector of observed Fisher information based on the final CAT $θ$ estimates and the item responses.

cat_sem

a vector of observed SEM estimates (or posterior standard deviations) based on the final CAT $θ$ estimates and the item responses.

cat_length

a vector indicating the number of items administered to each simulee in the CAT

cat_term

a vector indicating how each CAT was terminated.

tot_theta

a vector of $θ$ estimates given the entire item bank.

tot_categ

a vector indicating the classification of each simulee given the entire item bank.

tot_info

a vector of observed Fisher information based on the entire item bank worth of responses.

tot_sem

a vector of observed SEM estimates based on the entire item bank worth of responses.

true_theta

a vector of true $θ$ values if specified by the user.

true_categ

a vector of true classification given $θ$ .

full_params

the full item bank.

full_resp

the full set of responses.

cat_indiv

a list of $θ$ estimates, observed SEM, observed information, the responses and the parameters chosen for each simulee over the entire CAT.

mod

a list of model specifications, as designated by the user, so that the CAT can be easily reproduced.

Details

The function catIrt performs a post-hoc computerized adaptive test (CAT), with a variety of user specified inputs. For a given person/simulee (e.g. simulee $i$ ), a CAT represents a simple set of stages surrounded by a while loop (e.g. Weiss and Kingsbury, 1984):

Item Selection: The next item is chosen based on a pre-specified criterion/criteria. For example, the classic item selection mechanism is picking an item such that it maximizes Fisher Information at the current estimate of $θ_{i}$ . Frequently, content balancing, item constraints, or item exposure will be taken into consideration at this point (aside from solely picking the "best item" for a given person). See itChoose for current item selection methods.
Estimation: $θ_{i}$ is estimated based on updated information, usually relating to the just-selected item and the response associated with that item. In a post-hoc CAT, all of the responses already exist, but in a standard CAT, "item administration" would be between "item selection" and "estimation." The classic estimation mechanism is estimating $θ_{i}$ based off of maximizing the likelihood given parameters and a set of responses. Other estimation mechanisms correct for bias in the maximum likelihood estimate or add a prior information (such as a prior distribution of $θ$ ). If an estimate is untenable (i.e. it returns a non-sensical value or $\infty$ ), the estimation procedure needs to have an alternative estimation mechanism. See mleEst for current estimation methods.
Termination: Either the test is terminated based on a pre-specified criterion/critera, or no termination criteria is satisfied, in which case the loop repeats. The standard termination criteria involve a fixed criterion (e.g. administering only 50 items), or a variable criterion (e.g. continuing until the observed SEM is below .3). Other termination criteria relate to cut-point tests (e.g. certification tests, classification tests), that depend not solely on ability but on whether that ability is estimated to exceed a threshold. catIrt terminates classification tests based on either the Sequential Probability Ratio Test (SPRT) (see Eggen, 1999), the Generalized Likelihood Ratio (GLR) (see Thompson, 2009), or the Confidence Interval Method (see Kingsbury & Weiss, 1983). Essentially, the SPRT compares the ratio of two likelihoods (e.g. the likelihood of the data given being in one category vs the likelihood of the data given being in the other category, as defined by $B + δ$ and $B - δ$ (where $B$ separates the categories and $δ$ is the halfwidth of the indifference region) and compares that ratio with a ratio of error rates ( $α$ and $β$ ) (see Wald, 1945). The GLR uses the maximum likelihood estimate in place of either $B + δ$ or $B - δ$ , and the confidence interval method terminates a CAT if the confidence interval surrounding an estimate of $θ$ is fully within one of the categories.

The CAT estimates $θ_{i 1}$ (an initial point) based on init.theta, and terminates the entire simulation after sequentially terminating each simulee's CAT.

References

Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23, 249 -- 261.

Kingsbury, G. G., & Weiss (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 257--283). New York, NY: Academic Press.

Thompson, N. A. (2009). Using the generalized likelihood ratio as a termination criterion. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC conference on computerized adaptive testing.

Wainer, H. (Ed.). (2000). Computerized Adaptive Testing: A Primer (2nd Edition). Mahwah, NJ: Lawrence Erlbaum Associates.

Wald, A. (1945). Sequential tests of statistical hypotheses. Annals of Mathematical Statistics, 16, 117 -- 186.

Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361-375.

Examples

Run this code

# NOT RUN {
#########################
# Binary Response Model #
#########################
set.seed(888)
# generating random theta:
theta <- rnorm(50)
# generating an item bank under a 2-parameter binary response model:
b.params <- cbind(a = runif(100, .5, 1.5), b = rnorm(100, 0, 2), c = 0)
# simulating responses:
b.resp <- simIrt(theta = theta, params = b.params, mod = "brm")$resp


## CAT 1 ##
# the typical, classic post-hoc CAT:
catStart1 <- list(init.theta = 0, n.start = 5,
                  select = "UW-FI", at = "theta",
                  n.select = 4, it.range = c(-1, 1),
                  score = "step", range = c(-1, 1),
                  step.size = 3, leave.after.MLE = FALSE)
catMiddle1 <- list(select = "UW-FI", at = "theta",
                   n.select = 1, it.range = NULL,
                   score = "MLE", range = c(-6, 6),
                   expos = "none")
catTerm1 <- list(term = "fixed", n.min = 10, n.max = 50)

cat1 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle1,
               catTerm = catTerm1)

# we can print, summarize, and plot:
cat1                                        # prints theta because
                                            # we have fewer than
                                            # 200 simulees
summary(cat1, group = TRUE, ids = "none")   # nice summary!

summary(cat1, group = FALSE, ids = 1:4)     # summarizing people too! :)

par(mfrow = c(2, 2))
plot(cat1, ask = FALSE)               # 2-parameter model, so expected FI
                                      # and observed FI are the same
par(mfrow = c(1, 1))

# we can also plot particular simulees:
par(mfrow = c(2, 1))
plot(cat1, which = "none", ids = c(1, 30), ask = FALSE)
par(mfrow = c(1, 1))


## CAT 2 ##
# using Fixed Point KL info rather than Unweighted FI to select items:
catStart2 <- catStart1
catMiddle2 <- catMiddle1
catTerm2 <- catTerm1

catStart2$leave.after.MLE <- TRUE         # leave after mixed response pattern
catMiddle2$select <- "FP-KL"
catMiddle2$at <- "bounds"
catMiddle2$delta <- .2
catTerm2$c.term <- list(bounds = 0)
cat2 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart2,
               catMiddle = catMiddle2,
               catTerm = catTerm2)
cor(cat1$cat_theta, cat2$cat_theta)       # very close!

summary(cat2, group = FALSE, ids = 1:4)   # rarely 5 starting items!


## CAT 3/4 ##
# using "precision" rather than "fixed" to terminate:
catTerm1$term <- catTerm2$term <- "precision"
catTerm1$p.term <- catTerm2$p.term <- list(method = "threshold", crit = .3)
cat3 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle1,
               catTerm = catTerm1)
cat4 <- catIrt(params = b.params, mod = "brm",
			   resp = b.resp,
			   catStart = catStart2,
			   catMiddle = catMiddle2,
			   catTerm = catTerm2)

mean(cat3$cat_length - cat4$cat_length) # KL info results in slightly more items


## CAT 5/6 ##
# classification CAT with a boundary of 0 (with default classification stuff):
catTerm5 <- list(term = "class", n.min = 10, n.max = 50,
                 c.term = list(method = "SPRT",
                               bounds = 0, delta = .2,
                               alpha = .10, beta = .10))
cat5 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle1,
               catTerm = catTerm5)
cat6 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle2,
               catTerm = catTerm5)

# how many were classified correctly?
mean(cat5$cat_categ == cat5$tot_categ)

# using a different selection mechanism, we get the similar results:
mean(cat6$cat_categ == cat6$tot_categ)


## CAT 7 ##
# we could change estimation to EAP with the default (normal) prior:
catMiddle7 <- catMiddle1
catMiddle7$score <- "EAP"
cat7 <- catIrt(params = b.params, mod = "brm", # much slower!
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle7,
               catTerm = catTerm1)
cor(cat1$cat_theta, cat7$cat_theta)            # pretty much the same


## CAT 8 ##
# let's specify the prior as something strange:
cat8 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle7,
               catTerm = catTerm1,
               ddist = dchisq, df = 4)

cat8   # all positive values of "theta"


## CAT 9 ##
# finally, we can have:
#   - more than one termination criteria,
#   - individual bounds per person,
#   - simulating based on theta without a response matrix.
catTerm9 <- list(term = c("fixed", "class"),
                 n.min = 10, n.max = 50,
                 c.term = list(method = "SPRT",
                               bounds = cbind(runif(length(theta), -1, 0),
                                              runif(length(theta), 0, 1)),
                               delta = .2,
                               alpha = .1, beta = .1))
cat9 <- catIrt(params = b.params, mod = "brm",
               resp = NULL, theta = theta,
               catStart = catStart1,
               catMiddle = catMiddle1,
               catTerm = catTerm9)

summary(cat9)   # see "... with Each Termination Criterion"


#########################
# Graded Response Model #
#########################
# generating random theta
theta <- rnorm(201)
# generating an item bank under a graded response model:
g.params <- cbind(a = runif(100, .5, 1.5), b1 = rnorm(100), b2 = rnorm(100),
                                           b3 = rnorm(100), b4 = rnorm(100))

# the graded response model is exactly the same, only slower!
cat10 <- catIrt(params = g.params, mod = "grm",
                resp = NULL, theta = theta,
                catStart = catStart1,
                catMiddle = catMiddle1,
                catTerm = catTerm1)

# warning because it.range cannot be specified for graded response models!

# if there is more than 200 simulees, it doesn't print individual thetas:
cat10

# }
# NOT RUN {
# play around with things - CATs are fun - a little frisky, but fun.
# }

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning