Last chance! 50% off unlimited learning
Sale ends in
catIrt
simulates Computerized Adaptive Tests (CATs) given a vector/matrix of
responses or a vector of ability values, a matrix of item parameters, and several
item selection mechanisms, estimation procedures, and termination criteria.
catIrt( params, mod = c("brm", "grm"),
resp = NULL,
theta = NULL,
catStart = list( n.start = 5, init.theta = 0,
select = c("UW-FI", "LW-FI", "PW-FI",
"FP-KL", "VP-KL", "FI-KL", "VI-KL",
"random"),
at = c("theta", "bounds"),
it.range = NULL, n.select = 1,
delta = .1,
score = c("fixed", "step", "random", "WLE", "BME", "EAP"),
range = c(-1, 1),
step.size = 3, leave.after.MLE = FALSE ),
catMiddle = list( select = c("UW-FI", "LW-FI", "PW-FI",
"FP-KL", "VP-KL", "FI-KL", "VI-KL",
"random"),
at = c("theta", "bounds"),
it.range = NULL, n.select = 1,
delta = .1,
score = c("MLE", "WLE", "BME", "EAP"),
range = c(-6, 6),
expos = c("none", "SH") ),
catTerm = list( term = c("fixed", "precision", "info", "class"),
score = c("MLE", "WLE", "BME", "EAP"),
n.min = 5, n.max = 50,
p.term = list(method = c("threshold", "change"),
crit = .25),
i.term = list(method = c("threshold", "change"),
crit = 2),
c.term = list(method = c("SPRT", "GLR", "CI"),
bounds = c(-1, 1),
categ = c(0, 1, 2),
delta = .1,
alpha = .05, beta = .05,
conf.lev = .95) ),
ddist = dnorm,
progress = TRUE, … )
# S3 method for catIrt
summary( object, group = TRUE, ids = "none", … )
# S3 method for catIrt
plot( x, which = "all", ids = "none",
conf.lev = .95, legend = TRUE, ask = TRUE, … )
a catIrt
object.
numeric: a matrix of item parameters. If specified as a matrix,
the rows must index the items, and the columns must designate the item
parameters. For the binary response model, params
must either
be a 3-column matrix (if not using item exposure control), a 4-5-column
matrix (with Sympson-Hetter parameters as the last column if using item
exposure control), or a 4-5-column matrix (if including the item number
as the first column). See Details for more information.
character: a character string indicating the IRT model. Current support
is for the 3-parameter binary response model ("brm"),
and Samejima's graded response model ("grm"). The contents
of params
must match the designation of mod
. If mod
is
left blank, it will be designated the class of resp
(if resp
inherits
either "brm" or "grm"), and if that fails, it will ask the user (if in
interactive mode) or error.
numeric: either a thetas
.
For the binary response model ("brm"), resp
must solely contain 0s
and 1s. For the graded response model ("grm"), resp
must solely contain
integers params
.
numeric: either a resp
.
list: a list of options for starting the CAT including:
n.start
: a scalar indicating the number of items that are used for each
simulee at the beginning of the CAT. After n.start reaches the specified value,
the CAT will shift to the middle set of parameters.
init.theta
: a scalar or vector of initial starting estimates of init.theta
is a scalar, every simulee will have the same starting value.
Otherwise, simulees will have different starting values based on the respective element
of init.theta
.
select
: a character string indicating the item selection method for the
first few items. Items can be selected either through maximum Fisher information or
Kullback-Leibler divergence methods or randomly. The Fisher information methods include
"UW-FI": unweighted Fisher information at a point.
"LW-FI": Fisher information weighted across the likelihood function.
"PW-FI": Fisher information weighted across the posterior distribution of
And the Kullback-Leibler divergence methods include
"FP-KL": pointwise KL divergence between [P +/- delta], where
P is either the current
"VP-KL": pointwise KL divergence between [P +/- delta/sqrt(n)], where n is the number of items given to this point in the CAT.
"FI-KL": KL divergence integrated along [P -/+ delta] with respect to P
"VI-KL": KL divergence integrated along [P -/+ delta/sqrt(n)] with respect to P.
See itChoose
for more information.
at
: a character string indicating where to select items. If select
is "UW-FI" and at
is "theta", then items will be selected
to maximize Fisher information at the proximate
it.range
: Either a 2-element numeric vector indicating the minimum and maximum
allowed difficulty parameters for items selected during the starting portion of the CAT
(only if mod
is equal to "brm") or NULL indicating no item parameter
restrictions. See itChoose
for more information.
n.select
: an integer indicating the number of items to select at one time.
For instance, if select
is "UW-FI", at
is "theta", and
n.select
is 5, the item choosing function will randomly select between the top
5 items that maximize expected Fisher information at proximate
delta
: a scalar indicating the multiplier used in initial item selection if
a Kullback-Leibler method is chosen.
score
: a character string indicating the init.thet
), "step" (by adding or subtracting step.size
d...
) functions. See
mleEst
for more information.
range
: a 2-element numeric vector indicating the minimum and maximum that
step.size
: a scalar indicating how much to increment or decrement the
estimate of score
is set to "step".
leave.after.MLE
: a logical indicating whether to skip the remainder of the starting
items if the user has a mixed response pattern and/or a finite maximum likelihood estimate
of
list: a list of options for selecting/scoring during the middle of the CAT, including:
select
: a character string indicating the item selection method for the
remaining items. See select
in catStart
for an explanation
of the options.
at
: a character string indicating where to select items. See select
in
catStart
for an explanation of the options.
it.range
: Either a 2-element numeric vector indicating the minimum and maximum
allowed difficulty parameters for items selected during the middle portion of the CAT
(only if mod
is equal to "brm") or NULL indicating no item parameter
restrictions. See itChoose
for more information.
n.select
: an integer indicating the number of items to select at one time.
delta
: a scalar indicating the multiplier used in middle item selection if
a Kullback-Leibler method is chosen.
score
: a character string indicating the d...
) functions. See
mleEst
for more information.
range
: a 2-element numeric vector indicating the minimum and maximum that
expos
: a character string indicating whether no item exposure controls should be
implemented ("none") or whether the CAT should use Sympson-Hetter exposure
controls ("SH"). If (and only if) expos
is equal to "SH",
the last column of the parameter matrix should indicate the probability of an item
being administered given that it is selected.
list: a list of options for stopping/terminating the CAT, including:
term
: a scalar/vector indicating the termination criterion/criteria. CATs can
be terminated either through a fixed number of items ("fixed") declared
through the n.max
argument; related to SEM of a simulee ("precision")
declared through the p.term
argument; related to the test information of a
simulee at a particular point in the cat ("info") declared through the
i.term
argument; and/or when a simulee falls into a category. If more than
one termination criteria is selected, the CAT will terminate after successfully satisfying
the first of those for a given simulee.
score
: a character string indicating the score
is used to estimate score
in catTerm
are identical to the
options of score
in catMiddle
.
n.min
: an integer indicating the minimum number of items that a simulee
should "take" before any of the termination criteria are checked.
n.max
: an integer indicating the maximum number of items to administer
before terminating the CAT.
p.term
: a list indicating the parameters of a precision-based stopping rule,
only if term
is "precision", including:
method
: a character string indicating whether to terminate the CAT when the
SEM dips below a threshold ("threshold") or changes less than a particular
amount ("change").
crit
: a scalar indicating either the maximum SEM of a simulee before
terminating the CAT or the maximum change in the simulee's SEM before terminating
the CAT.
i.term
: a list indicating the parameters of a information-based stopping rule,
only if term
is "info", including:
method
: a character string indicating whether to terminate the CAT when FI
exceeds a threshold ("threshold") or changes less than a particular
amount ("change").
crit
: a scalar indicating either the minimum FI of a simulee before
terminating the CAT or the maximum change in the simulee's FI before terminating
the CAT.
c.term
: a list indicating the parameters of a classification CAT, only if
term
is "class" or any of the selection methods are at
one or more "bounds", including:
method
: a scalar indicating the method used for a classification CAT. As of
now, the classification CAT options are the Sequential Probability Ratio Test
("SPRT"), the Generalized Likelihood Ratio ("GLR"), or the Confidence
Interval method ("CI").
bounds
: a scalar, vector, or matrix of classification bounds. If specified as a
scalar, there will be one bound for each simulee at that value. If specified as a
categ
: a vector indicating the names of the categories into which the simulees
should be classified. The length of categ
should be one greater than the length
of bounds
.
delta
: a scalar indicating the half-width of an indifference region when performing
an SPRT-based classification CAT or selecting items by Kullback-Leibler divergence. See
Eggen (1999) and KL
for more information.
alpha
: a scalar indicating the specified Type I error rate for performing an SPRT-
based classification CAT.
beta
: a scalar indicating the specified Type II error rate for performing an SPRT-
based classification CAT.
conf.lev
: a scalar between 0 and 1 indicating the confidence level used when performing
a confidence-based ("CI") classification CAT.
function: a function indicating how to calculate prior densities
for Bayesian estimation or particular item selection methods. For instance,
if you wish to specify a normal prior, ddist = dnorm
, and if you wish
to specify a uniform prior, ddist = dunif
. Note that it is standard in
R to use d
… to indicate a density. See itChoose
for
more information.
numeric: a scalar or vector of integers between 1 and 4, indicating which plots to include. The plots are as follows:
Bank Information
Bank SEM
CAT Information
CAT SEM
which
can also be "none", in which case plot.catIrt
will
not plot any information functions, or it can be "all", in which case
plot.catIrt
will plot all four information functions.
logical: TRUE or FALSE indicating whether to display a summary at the group level.
numeric: a scalar or vector of integers between 1 and the number of
simulees indicating which simulees to plot and/or summarize their CAT
process and all of their ids
can
also be "none" (or, equivalently, NULL) or "all".
numeric: a scalar between 0 and 1 indicating the desired confidence
level plotted for the individual
logical: TRUE or FALSE indicating whether the plot function should display a legend on the plot.
logical: TRUE or FALSE indicating whether the plot function should ask between plots.
logical: TRUE or FALSE indicating whether the catIrt
function
should display a progress bar during the CAT.
arguments passed to ddist
or plot.catIrt
, usually distribution
parameters identified by name or graphical parameters.
The function catIrt
returns a list (of class "catIrt") with the following elements:
a vector of final CAT
a vector indicating the final classification of each simulee in the CAT. If
term
is not "class", cat_categ
will be a vector
of NA values.
a vector of observed Fisher information based on the final CAT
a vector of observed SEM estimates (or posterior standard deviations) based on the
final CAT
a vector indicating the number of items administered to each simulee in the CAT
a vector indicating how each CAT was terminated.
a vector of
a vector indicating the classification of each simulee given the entire item bank.
a vector of observed Fisher information based on the entire item bank worth of responses.
a vector of observed SEM estimates based on the entire item bank worth of responses.
a vector of true
a vector of true classification given
the full item bank.
the full set of responses.
a list of
a list of model specifications, as designated by the user, so that the CAT can be easily reproduced.
The function catIrt
performs a post-hoc computerized adaptive test (CAT),
with a variety of user specified inputs. For a given person/simulee (e.g. simulee while
loop
(e.g. Weiss and Kingsbury, 1984):
Item Selection: The next item is chosen based on a pre-specified criterion/criteria.
For example, the classic item selection mechanism is picking an item such that it
maximizes Fisher Information at the current estimate of itChoose
for current item selection methods.
Estimation: mleEst
for
current estimation methods.
Termination: Either the test is terminated based on a pre-specified criterion/critera,
or no termination criteria is satisfied, in which case the loop repeats. The standard
termination criteria involve a fixed criterion (e.g. administering only 50 items),
or a variable criterion (e.g. continuing until the observed SEM is below .3). Other
termination criteria relate to cut-point tests (e.g. certification tests, classification tests),
that depend not solely on ability but on whether that ability is estimated to exceed a threshold.
catIrt
terminates classification tests based on either the Sequential Probability Ratio Test
(SPRT) (see Eggen, 1999), the Generalized Likelihood Ratio (GLR) (see Thompson, 2009), or the
Confidence Interval Method (see Kingsbury & Weiss, 1983). Essentially, the SPRT compares the ratio
of two likelihoods (e.g. the likelihood of the data given being in one category vs the likelihood
of the data given being in the other category, as defined by
The CAT estimates init.theta
,
and terminates the entire simulation after sequentially terminating each simulee's CAT.
Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23, 249 -- 261.
Kingsbury, G. G., & Weiss (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 257--283). New York, NY: Academic Press.
Thompson, N. A. (2009). Using the generalized likelihood ratio as a termination criterion. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC conference on computerized adaptive testing.
Wainer, H. (Ed.). (2000). Computerized Adaptive Testing: A Primer (2nd Edition). Mahwah, NJ: Lawrence Erlbaum Associates.
Wald, A. (1945). Sequential tests of statistical hypotheses. Annals of Mathematical Statistics, 16, 117 -- 186.
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361-375.
# NOT RUN {
#########################
# Binary Response Model #
#########################
set.seed(888)
# generating random theta:
theta <- rnorm(50)
# generating an item bank under a 2-parameter binary response model:
b.params <- cbind(a = runif(100, .5, 1.5), b = rnorm(100, 0, 2), c = 0)
# simulating responses:
b.resp <- simIrt(theta = theta, params = b.params, mod = "brm")$resp
## CAT 1 ##
# the typical, classic post-hoc CAT:
catStart1 <- list(init.theta = 0, n.start = 5,
select = "UW-FI", at = "theta",
n.select = 4, it.range = c(-1, 1),
score = "step", range = c(-1, 1),
step.size = 3, leave.after.MLE = FALSE)
catMiddle1 <- list(select = "UW-FI", at = "theta",
n.select = 1, it.range = NULL,
score = "MLE", range = c(-6, 6),
expos = "none")
catTerm1 <- list(term = "fixed", n.min = 10, n.max = 50)
cat1 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart1,
catMiddle = catMiddle1,
catTerm = catTerm1)
# we can print, summarize, and plot:
cat1 # prints theta because
# we have fewer than
# 200 simulees
summary(cat1, group = TRUE, ids = "none") # nice summary!
summary(cat1, group = FALSE, ids = 1:4) # summarizing people too! :)
par(mfrow = c(2, 2))
plot(cat1, ask = FALSE) # 2-parameter model, so expected FI
# and observed FI are the same
par(mfrow = c(1, 1))
# we can also plot particular simulees:
par(mfrow = c(2, 1))
plot(cat1, which = "none", ids = c(1, 30), ask = FALSE)
par(mfrow = c(1, 1))
## CAT 2 ##
# using Fixed Point KL info rather than Unweighted FI to select items:
catStart2 <- catStart1
catMiddle2 <- catMiddle1
catTerm2 <- catTerm1
catStart2$leave.after.MLE <- TRUE # leave after mixed response pattern
catMiddle2$select <- "FP-KL"
catMiddle2$at <- "bounds"
catMiddle2$delta <- .2
catTerm2$c.term <- list(bounds = 0)
cat2 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart2,
catMiddle = catMiddle2,
catTerm = catTerm2)
cor(cat1$cat_theta, cat2$cat_theta) # very close!
summary(cat2, group = FALSE, ids = 1:4) # rarely 5 starting items!
## CAT 3/4 ##
# using "precision" rather than "fixed" to terminate:
catTerm1$term <- catTerm2$term <- "precision"
catTerm1$p.term <- catTerm2$p.term <- list(method = "threshold", crit = .3)
cat3 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart1,
catMiddle = catMiddle1,
catTerm = catTerm1)
cat4 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart2,
catMiddle = catMiddle2,
catTerm = catTerm2)
mean(cat3$cat_length - cat4$cat_length) # KL info results in slightly more items
## CAT 5/6 ##
# classification CAT with a boundary of 0 (with default classification stuff):
catTerm5 <- list(term = "class", n.min = 10, n.max = 50,
c.term = list(method = "SPRT",
bounds = 0, delta = .2,
alpha = .10, beta = .10))
cat5 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart1,
catMiddle = catMiddle1,
catTerm = catTerm5)
cat6 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart1,
catMiddle = catMiddle2,
catTerm = catTerm5)
# how many were classified correctly?
mean(cat5$cat_categ == cat5$tot_categ)
# using a different selection mechanism, we get the similar results:
mean(cat6$cat_categ == cat6$tot_categ)
## CAT 7 ##
# we could change estimation to EAP with the default (normal) prior:
catMiddle7 <- catMiddle1
catMiddle7$score <- "EAP"
cat7 <- catIrt(params = b.params, mod = "brm", # much slower!
resp = b.resp,
catStart = catStart1,
catMiddle = catMiddle7,
catTerm = catTerm1)
cor(cat1$cat_theta, cat7$cat_theta) # pretty much the same
## CAT 8 ##
# let's specify the prior as something strange:
cat8 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart1,
catMiddle = catMiddle7,
catTerm = catTerm1,
ddist = dchisq, df = 4)
cat8 # all positive values of "theta"
## CAT 9 ##
# finally, we can have:
# - more than one termination criteria,
# - individual bounds per person,
# - simulating based on theta without a response matrix.
catTerm9 <- list(term = c("fixed", "class"),
n.min = 10, n.max = 50,
c.term = list(method = "SPRT",
bounds = cbind(runif(length(theta), -1, 0),
runif(length(theta), 0, 1)),
delta = .2,
alpha = .1, beta = .1))
cat9 <- catIrt(params = b.params, mod = "brm",
resp = NULL, theta = theta,
catStart = catStart1,
catMiddle = catMiddle1,
catTerm = catTerm9)
summary(cat9) # see "... with Each Termination Criterion"
#########################
# Graded Response Model #
#########################
# generating random theta
theta <- rnorm(201)
# generating an item bank under a graded response model:
g.params <- cbind(a = runif(100, .5, 1.5), b1 = rnorm(100), b2 = rnorm(100),
b3 = rnorm(100), b4 = rnorm(100))
# the graded response model is exactly the same, only slower!
cat10 <- catIrt(params = g.params, mod = "grm",
resp = NULL, theta = theta,
catStart = catStart1,
catMiddle = catMiddle1,
catTerm = catTerm1)
# warning because it.range cannot be specified for graded response models!
# if there is more than 200 simulees, it doesn't print individual thetas:
cat10
# }
# NOT RUN {
# play around with things - CATs are fun - a little frisky, but fun.
# }
Run the code above in your browser using DataLab