mdirt
fits a variety of item response models with discrete latent variables.
These include, but are not limited to, latent class analysis, multidimensional latent
class models, multidimensional discrete latent class models, DINA/DINO models,
grade of measurement models, C-RUM, and so on. If response models are not defined explicitly
then customized models can defined using the createItem
function.
mdirt(
data,
model,
customTheta = NULL,
structure = NULL,
item.Q = NULL,
nruns = 1,
method = "EM",
covdata = NULL,
formula = NULL,
itemtype = "lca",
optimizer = "nlminb",
return_max = TRUE,
group = NULL,
GenRandomPars = FALSE,
verbose = TRUE,
pars = NULL,
technical = list(),
...
)
a matrix
or data.frame
that consists of
numerically ordered data, with missing data coded as NA
number of mutually exclusive classes to fit, or alternatively a more specific
mirt.model
definition (which reflects the so-called Q-matrix).
Note that when using a mirt.model
,
the order with which the syntax factors/attributes are defined are associated with the
columns in the customTheta
input
input passed to technical = list(customTheta = ...)
, but is included
directly in this function for convenience. This input is most interesting for discrete latent models
because it allows customized patterns of latent classes (i.e., defines the possible combinations
of the latent attribute profile). The default builds the pattern customTheta = diag(model)
,
which is the typical pattern for the traditional latent class analysis whereby class
membership mutually distinct and exhaustive. See thetaComb
for a quick method
to generate a matrix with all possible combinations
an R formula allowing the profile probability patterns (i.e., the structural component of
the model) to be fitted according to a log-linear model. When NULL
, all profile probabilities
(except one) will be estimated. Use of this input requires that the customTheta
input is supplied,
and that the column names in this matrix match the names found within this formula
a list of item-level Q-matrices indicating how the respective categories should be
modeled by the underlying attributes. Each matrix must represent a Theta
matrix; otherwise, a value ofNULL
will default
to a matrix consisting of 1's for each must
contain only 0's so that the first category represents the reference category for identification
a numeric value indicating how many times the model should be fit to the data
when using random starting values. If greater than 1, GenRandomPars
is set to true
by default
estimation method. Can be 'EM' or 'BL' (see mirt
for more details)
a data.frame of data used for latent regression models
an R formula (or list of formulas) indicating how the latent traits
can be regressed using external covariates in covdata
. If a named list
of formulas is supplied (where the names correspond to the latent trait/attribute names in model
)
then specific regression effects can be estimated for each factor. Supplying a single formula
will estimate the regression parameters for all latent variables by default
a vector indicating the itemtype associated with each item.
For discrete models this is limited to only 'lca' or items defined using a
createItem
definition
optimizer used for the M-step, set to 'nlminb'
by default.
See mirt
for more details
logical; when nruns > 1
, return the model that has the most optimal
maximum likelihood criteria? If FALSE, returns a list of all the estimated objects
a factor variable indicating group membership used for multiple group analyses
logical; use random starting values
logical; turn on messages to the R console
used for modifying starting values; see mirt
for details
list of lower-level inputs. See mirt
for details
additional arguments to be passed to the estimation engine. See mirt
for more details and examples
The latent class IRT model with two latent classes has the form
where the
When the item.Q
for is utilized, the above equation can be understood as
where by construction Q
is a item.Q
definition for each respective item.
Phil Chalmers rphilip.chalmers@gmail.com
Posterior classification accuracy for each response pattern may be obtained
via the fscores
function. The summary()
function will display
the category probability values given the class membership, which can also
be displayed graphically with plot()
, while coef()
displays the raw coefficient values (and their standard errors, if estimated). Finally,
anova()
is used to compare nested models, while
M2
and itemfit
may be used for model fitting purposes.
Chalmers, R., P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29.
Proctor, C. H. (1970). A probabilistic formulation and statistical analysis for Guttman scaling. Psychometrika, 35, 73-78. tools:::Rd_expr_doi("10.18637/jss.v048.i06")
thetaComb
, fscores
, mirt.model
, M2
,
itemfit
, boot.mirt
, mirtCluster
,
wald
, coef-method
, summary-method
,
anova-method
, residuals-method
# LSAT6 dataset
dat <- expand.table(LSAT6)
# fit with 2-3 latent classes
(mod2 <- mdirt(dat, 2))
if (FALSE) {
(mod3 <- mdirt(dat, 3))
summary(mod2)
residuals(mod2)
residuals(mod2, type = 'exp')
anova(mod2, mod3)
M2(mod2)
itemfit(mod2)
# generate classification plots
plot(mod2)
plot(mod2, facet_items = FALSE)
plot(mod2, profile = TRUE)
# available for polytomous data
mod <- mdirt(Science, 2)
summary(mod)
plot(mod)
plot(mod, profile=TRUE)
# classification based on response patterns
fscores(mod2, full.scores = FALSE)
# classify individuals either with the largest posterior probability.....
fs <- fscores(mod2)
head(fs)
classes <- 1:2
class_max <- classes[apply(apply(fs, 1, max) == fs, 1, which)]
table(class_max)
# ... or by probability sampling (i.e., plausible value draws)
class_prob <- apply(fs, 1, function(x) sample(1:2, 1, prob=x))
table(class_prob)
# plausible value imputations for stochastic classification in both classes
pvs <- fscores(mod2, plausible.draws=10)
tabs <- lapply(pvs, function(x) apply(x, 2, table))
tabs[[1]]
# fit with random starting points (run in parallel to save time)
if(interactive()) mirtCluster()
mod <- mdirt(dat, 2, nruns=10)
#--------------------------
# Grade of measurement model
# define a custom Theta grid for including a 'fuzzy' class membership
(Theta <- matrix(c(1, 0, .5, .5, 0, 1), nrow=3 , ncol=2, byrow=TRUE))
(mod_gom <- mdirt(dat, 2, customTheta = Theta))
summary(mod_gom)
#-----------------
# Multidimensional discrete latent class model
dat <- key2binary(SAT12,
key = c(1,4,5,2,3,1,2,1,3,1,2,4,2,1,5,3,4,4,1,4,3,3,4,1,3,5,1,3,1,5,4,5))
# define Theta grid for three latent classes
(Theta <- thetaComb(0:1, 3))
(mod_discrete <- mdirt(dat, 3, customTheta = Theta))
summary(mod_discrete)
# Located latent class model
model <- mirt.model('C1 = 1-32
C2 = 1-32
C3 = 1-32
CONSTRAIN = (1-32, a1), (1-32, a2), (1-32, a3)')
(mod_located <- mdirt(dat, model, customTheta = diag(3)))
summary(mod_located)
#-----------------
### DINA model example
# generate some suitable data for a two dimensional DINA application
# (first columns are intercepts)
set.seed(1)
Theta <- expand.table(matrix(c(1,0,0,0,
1,1,0,0,
1,0,1,0,
1,1,1,1), 4, 4, byrow=TRUE),
freq = c(200,200,100,500))
a <- matrix(c(rnorm(15, -1.5, .5), rlnorm(5, .2, .3), numeric(15), rlnorm(5, .2, .3),
numeric(15), rlnorm(5, .2, .3)), 15, 4)
guess <- plogis(a[11:15,1]) # population guess
slip <- 1 - plogis(rowSums(a[11:15,])) # population slip
dat <- simdata(a, Theta=Theta, itemtype = 'lca')
# first column is the intercept, 2nd and 3rd are attributes
theta <- cbind(1, thetaComb(0:1, 2))
theta <- cbind(theta, theta[,2] * theta[,3]) #DINA interaction of main attributes
model <- mirt.model('Intercept = 1-15
A1 = 1-5
A2 = 6-10
A1A2 = 11-15')
# last 5 items are DINA (first 10 are unidimensional C-RUMs)
DINA <- mdirt(dat, model, customTheta = theta)
coef(DINA, simplify=TRUE)
summary(DINA)
M2(DINA) # fits well (as it should)
cfs <- coef(DINA, simplify=TRUE)$items[11:15,]
cbind(guess, estguess = plogis(cfs[,1]))
cbind(slip, estslip = 1 - plogis(rowSums(cfs)))
### DINO model example
theta <- cbind(1, thetaComb(0:1, 2))
# define theta matrix with negative interaction term
(theta <- cbind(theta, -theta[,2] * theta[,3]))
model <- mirt.model('Intercept = 1-15
A1 = 1-5, 11-15
A2 = 6-15
Yoshi = 11-15
CONSTRAIN = (11,a2,a3,a4), (12,a2,a3,a4), (13,a2,a3,a4),
(14,a2,a3,a4), (15,a2,a3,a4)')
# last five items are DINOs (first 10 are unidimensional C-RUMs)
DINO <- mdirt(dat, model, customTheta = theta)
coef(DINO, simplify=TRUE)
summary(DINO)
M2(DINO) #doesn't fit as well, because not the generating model
## C-RUM (analogous to MIRT model)
theta <- cbind(1, thetaComb(0:1, 2))
model <- mirt.model('Intercept = 1-15
A1 = 1-5, 11-15
A2 = 6-15')
CRUM <- mdirt(dat, model, customTheta = theta)
coef(CRUM, simplify=TRUE)
summary(CRUM)
# good fit, but over-saturated (main effects for items 11-15 can be set to 0)
M2(CRUM)
#------------------
# multidimensional latent class model
dat <- key2binary(SAT12,
key = c(1,4,5,2,3,1,2,1,3,1,2,4,2,1,5,3,4,4,1,4,3,3,4,1,3,5,1,3,1,5,4,5))
# 5 latent classes within 2 different sets of items
model <- mirt.model('C1 = 1-16
C2 = 1-16
C3 = 1-16
C4 = 1-16
C5 = 1-16
C6 = 17-32
C7 = 17-32
C8 = 17-32
C9 = 17-32
C10 = 17-32
CONSTRAIN = (1-16, a1), (1-16, a2), (1-16, a3), (1-16, a4), (1-16, a5),
(17-32, a6), (17-32, a7), (17-32, a8), (17-32, a9), (17-32, a10)')
theta <- diag(10) # defined explicitly. Otherwise, this profile is assumed
mod <- mdirt(dat, model, customTheta = theta)
coef(mod, simplify=TRUE)
summary(mod)
#------------------
# multiple group with constrained group probabilities
dat <- key2binary(SAT12,
key = c(1,4,5,2,3,1,2,1,3,1,2,4,2,1,5,3,4,4,1,4,3,3,4,1,3,5,1,3,1,5,4,5))
group <- rep(c('G1', 'G2'), each = nrow(SAT12)/2)
Theta <- diag(2)
# the latent class parameters are technically located in the (nitems + 1) location
model <- mirt.model('A1 = 1-32
A2 = 1-32
CONSTRAINB = (33, c1)')
mod <- mdirt(dat, model, group = group, customTheta = Theta)
coef(mod, simplify=TRUE)
summary(mod)
#------------------
# Probabilistic Guttman Model (Proctor, 1970)
# example analysis can also be found in the sirt package (see ?prob.guttman)
data(data.read, package = 'sirt')
head(data.read)
Theta <- matrix(c(1,0,0,0,
1,1,0,0,
1,1,1,0,
1,1,1,1), 4, byrow=TRUE)
model <- mirt.model("INTERCEPT = 1-12
C1 = 1,7,9,11
C2 = 2,5,8,10,12
C3 = 3,4,6")
mod <- mdirt(data.read, model, customTheta=Theta)
summary(mod)
M2(mod)
itemfit(mod)
}
Run the code above in your browser using DataLab