mirt: Full-Information Item Factor Analysis (Multidimensional Item Response Theory)

Description

mirt fits an unconditional maximum likelihood factor analysis model to dichotomous and polytomous data under the item response theory paradigm. Fits univariate and multivariate Rasch, 1-4PL, graded, (generalized) partial credit, nominal, graded rating scale, Rasch rating scale, nested logistic, and partially compensatory models using the EM algorithm. User defined item classes can also be defined using the createItem function. Models may also contain 'explanatory' person or item level predictors, though these can only be included by using the mixedmirt function.

Usage

mirt(data, model, itemtype = NULL, guess = 0, upper = 1,
    SE = FALSE, SE.type = 'SEM', pars = NULL, constrain =
    NULL, parprior = NULL, calcNull = TRUE, rotate =
    'oblimin', Target = NaN, quadpts = NULL, grsm.block =
    NULL, rsm.block = NULL, key= NULL, nominal.highlow =
    NULL, cl = NULL, large = FALSE, verbose = TRUE,
    technical = list(), ...)

  ## S3 method for class 'ExploratoryClass':
summary(object, rotate = '',
    Target = NULL, suppress = 0, digits = 3, verbose =
    TRUE, ...)

  ## S3 method for class 'ExploratoryClass':
coef(object, rotate = '',
    Target = NULL, digits = 3, verbose = TRUE, ...)

  ## S3 method for class 'ExploratoryClass':
anova(object, object2)

  ## S3 method for class 'ExploratoryClass':
fitted(object, digits = 3,
    ...)

  ## S3 method for class 'ExploratoryClass':
plot(x, y, type = 'info',
    npts = 50, theta_angle = 45, rot = list(xaxis = -70,
    yaxis = 30, zaxis = 10), ...)

  ## S3 method for class 'ExploratoryClass':
residuals(object, restype =
    'LD', digits = 3, df.p = FALSE, printvalue = NULL,
    verbose = TRUE, ...)

Arguments

data

a matrix or data.frame that consists of numerically ordered data, with missing data coded as NA

model

an object returned from confmirt.model() declaring how the factor model is to be estimated, or a single numeric value indicating the number of exploratory factors to estimate. See confm

itemtype

type of items to be modeled, declared as a vector for each item or a single value which will be repeated globally. The NULL default assumes that the items follow a graded or 2PL structure, however they may be changed to the following: 'Rasch',

grsm.block

an optional numeric vector indicating where the blocking should occur when using the grsm, NA represents items that do not belong to the grsm block (other items that may be estimated in the test data). For example, to specify two blocks of 3 w

rsm.block

same as grsm.block, but for 'rsm' blocks

key

a numeric vector of the response scoring key. Required when using nested logit item types, and must be the same length as the number of items used. Items that are not nested logit will ignore this vector, so use NA in item locatio

logical; estimate the standard errors? Calculates the information matrix from MHRM subroutine for stochastic approximation, Bock and Lieberman style information (use only with small number of items), or supplemented EM (SEM) computations for B

SE.type

type of estimation method to use for calculating the parameter information matrix. Can be 'MHRM' for stochastic estimation, 'BL' for the Bock and Lieberman approach (EM only), or 'SEM' for the supplemente

guess

fixed pseudo-guessing parameters. Can be entered as a single value to assign a global guessing parameter or may be entered as a numeric vector corresponding to each item

upper

fixed upper bound parameters for 4-PL model. Can be entered as a single value to assign a global guessing parameter or may be entered as a numeric vector corresponding to each item

rotate

type of rotation to perform after the initial orthogonal parameters have been extracted by using summary; default is 'oblimin'. If rotate != '' in the summary input then the default from the

Target

a dummy variable matrix indicting a target rotation pattern

constrain

a list of user declared equality constraints. To see how to define the parameters correctly use pars = 'values' initially to see how the parameters are labeled. To constrain parameters to be equal create a list with separate conca

parprior

a list of user declared prior item probabilities. To see how to define the parameters correctly use pars = 'values' initially to see how the parameters are labeled. Can define either normal (normally for slopes and intercepts) or

pars

a data.frame with the structure of how the starting values, parameter numbers, and estimation logical values are defined. The user may observe how the model defines the values by using pars = 'values', and this object can in turn

calcNull

logical; calculate the Null model for fit statics (e.g., TLI)? Only applicable if the data contains no NA's

a cluster object from the parallel package (set from using makeCluster(ncores))

quadpts

number of quadrature points per dimension. By default the number of quadrature uses the following scheme:

switch(as.character(nfact), '1'=40, '2'=20,
  '3'=10, '4'=7, '5'=5, 3)

printvalue

a numeric value to be specified when using the res='exp' option. Only prints patterns that have standardized residuals greater than abs(printvalue). The default (NULL) prints all response patterns

an object of class mirt to be plotted or printed

an unused variable to be ignored

object

a model estimated from mirt of class ExploratoryClass or ConfirmatoryClass

object2

a second model estimated from any of the mirt package estimation methods ExploratoryClass with more estimated parameters than object

suppress

a numeric value indicating which (possibly rotated) factor loadings should be suppressed. Typical values are around .3 in most statistical software. Default is 0 for no suppression

digits

number of significant digits to be rounded

type

type of plot to view; can be 'info' to show the test information function, 'infocontour' for the test information contours, 'SE' for the test standard error function, 'trace' and 'infot

theta_angle

numeric values ranging from 0 to 90 used in plot. If a vector is used then a bubble plot is created with the summed information across the angles specified (e.g.,

theta_angle = seq(0, 90,
  by=10)

)

npts

number of quadrature points to be used for plotting features. Larger values make plots look smoother

rot

allows rotation of the 3D graphics

large

a logical, indicating whether the internal collapsed data should be returned, or list of internally computed mirt parameters containing the data. If TRUE a list containing the organized data used prior to estimation is returned. T

restype

type of residuals to be displayed. Can be either 'LD' for a local dependence matrix (Chen & Thissen, 1997) or 'exp' for the expected values for the frequencies of every response pattern

df.p

logical; print the degrees of freedom and p-values?

nominal.highlow

optional matrix indicating the highest (row 1) and lowest (row 2) categories to be used for the nominal response model. Using this input may result in better numerical stability. The matrix input should be a 2 by nitems numeric matrix, where e

verbose

logical; print observed log-likelihood value at each iteration?

technical

a list containing lower level technical parameters for estimation. May be: [object Object],[object Object],[object Object],[object Object]

...

additional arguments to be passed

Confirmatory IRT

Specification of the confirmatory item factor analysis model follows many of the rules in the SEM framework for confirmatory factor analysis. The variances of the latent factors are automatically fixed to 1 to help facilitate model identification. All parameters may be fixed to constant values or set equal to other parameters using the appropriate declarations. If the model is confirmatory then the returned class will be 'ConfirmatoryClass'. Confirmatory models may also contain 'explanatory' person or item level predictors, though including predictors is limited only to the mixedmirt function.

Exploratory IRT

Specifying a number as the second input to confmirt an exploratory IRT model is estimated and can be viewed as a stochastic analogue of mirt, with much of the same behaviour and specifications. Rotation and target matrix options will be used in this subroutine and will be passed to the returned object for use in generic functions such as summary() and fscores. Again, factor means and variances are fixed to ensure proper identification. If the model is confirmatory then the returned class will be 'ExploratoryClass'. Estimation often begins by computing a matrix of quasi-tetrachoric correlations, potentially with Carroll's (1945) adjustment for chance responds. A MINRES factor analysis with nfact is then extracted and item parameters are estimated by $a_{ij} = f_{ij}/u_j$, where $f_{ij}$ is the factor loading for the jth item on the ith factor, and $u_j$ is the square root of the factor uniqueness, $\sqrt{1 - h_j^2}$. The initial intercept parameters are determined by calculating the inverse normal of the item facility (i.e., item easiness), $q_j$, to obtain $d_j = q_j / u_j$. A similar implementation is also used for obtaining initial values for polytomous items.

Convergence

Unrestricted full-information factor analysis is known to have problems with convergence, and some items may need to be constrained or removed entirely to allow for an acceptable solution. As a general rule dichotomous items with means greater than .95, or items that are only .05 greater than the guessing parameter, should be considered for removal from the analysis or treated with prior distributions. The same type of reasoning is applicable when including upper bound parameters as well. Also, increasing the number of quadrature points per dimension may help to stabilize the estimation process.

Details

mirt follows the item factor analysis strategy by marginal maximum likelihood estimation (MML) outlined in Bock and Aiken (1981), Bock, Gibbons and Muraki (1988), and Muraki and Carlson (1995). Nested models may be compared via the approximate chi-squared difference test or by a reduction in AIC/BIC values (comparison via anova). summary and coef allow for all the rotations available from the GPArotation package (e.g., rotate = 'oblimin') as well as a 'promax' rotation. Using plot will plot the test information function or the test standard errors for 1 and 2 dimensional solutions, or all item trace lines if only 1 dimensional the test is only dichotomous items. To examine individual item plots use itemplot. Residuals are computed using the LD statistic (Chen & Thissen, 1997) in the lower diagonal of the matrix returned by residuals, and Cramer's V above the diagonal.

References

Andrich, D. (1978). A rating scale formulation for ordered response categories. Psychometrika, 43, 561-573. Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443-459. Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full-Information Item Factor Analysis. Applied Psychological Measurement, 12(3), 261-280. Bock, R. D. & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179-197. Chalmers, R., P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29. Lord, F. M. & Novick, M. R. (1968). Statistical theory of mental test scores. Addison-Wesley. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research. Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159-176. Muraki, E. & Carlson, E. B. (1995). Full-information factor analysis for polytomous item responses. Applied Psychological Measurement, 19, 73-90. Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monographs, 34. Suh, Y. & Bolt, D. (2010). Nested logit models for multiple-choice item response data. Psychometrika, 75, 454-473. Sympson, J. B. (1977). A model for testing with multidimensional items. Proceedings of the 1977 Computerized Adaptive Testing Conference. Thissen, D. (1982). Marginal maximum likelihood estimation for the one-parameter logistic model. Psychometrika, 47, 175-186. Wood, R., Wilson, D. T., Gibbons, R. D., Schilling, S. G., Muraki, E., & Bock, R. D. (2003). TESTFACT 4 for Windows: Test Scoring, Item Statistics, and Full-information Item Factor Analysis [Computer software]. Lincolnwood, IL: Scientific Software International.

Examples

Run this code

#load LSAT section 7 data and compute 1 and 2 factor models
data <- expand.table(LSAT7)

(mod1 <- mirt(data, 1))
coef(mod1)
coef(mod2 <- mirt(data, 1, SE = TRUE)) #standard errors with SEM method
coef(mod3 <- mirt(data, 1, SE = TRUE, SE.type = 'BL')) #standard errors with BL method
residuals(mod1)
plot(mod1) #test information function
plot(mod1, type = 'trace') #trace lines

#estimated 3PL model for item 5 only
(mod1.3PL <- mirt(data, 1, itemtype = c('2PL', '2PL', '2PL', '2PL', '3PL')))
coef(mod1.3PL)

#two factors (exploratory)
mod2 <- mirt(data, 2)

#too few iterations, try running more using current model as new starting
#   values (could also increase NCYCLES and rerun)
mod2 <- mirt(data, 2, pars = mod2values(mod2))
coef(mod2)
summary(mod2, rotate = 'oblimin') #oblimin rotation
residuals(mod2)
plot(mod2)

anova(mod1, mod2) #compare the two models
scores <- fscores(mod2) #save factor score table
scoresfull <- fscores(mod2, full.scores = TRUE, scores.only = TRUE) #factor scores for original data

#confirmatory
cmodel <- confmirt.model()
   F1 = 1,4,5
   F2 = 2,3


cmod <- mirt(data, cmodel)
coef(cmod)
anova(cmod, mod2)

###########
#data from the 'ltm' package in numeric format
pmod1 <- mirt(Science, 1)
plot(pmod1)
summary(pmod1)
fitIndices(pmod1) #M2 limited information statistic

#Constrain all slopes to be equal with the constrain = list() input
#first obtain parameter index
values <- mirt(Science,1, pars = 'values')
values #note that slopes are numbered 1,5,9,13, or index with values$parnum[values$name == 'a1']
(pmod1_equalslopes <- mirt(Science, 1, constrain = list(c(1,5,9,13))))

coef(pmod1_equalslopes)
anova(pmod1_equalslopes, pmod1) #significantly worse fit with almost all criteria

pmod2 <- mirt(Science, 2, technical = list(NCYCLES = 1000))
summary(pmod2)
plot(pmod2)
itemplot(pmod2, 1)
anova(pmod1, pmod2)

#unidimensional fit with a generalized partial credit and nominal model
(gpcmod <- mirt(Science, 1, 'gpcm'))
coef(gpcmod)

#for the nominal model the lowest and highest categories are assumed to be the
#  theoretically lowest and highest categories that related to the latetent trait(s), however
#  a custom nominal.highlow matrix can be passed to declare which item category should be
#  treated as the 'highest' and 'lowest' instead
(nomod <- mirt(Science, 1, 'nominal'))
coef(nomod) #ordering of ak values suggest that the items are indeed ordinal
anova(gpcmod, nomod)
itemplot(nomod, 3)

###########
#empirical dimensionality testing that includes 'guessing'

data(SAT12)
data <- key2binary(SAT12,
  key = c(1,4,5,2,3,1,2,1,3,1,2,4,2,1,5,3,4,4,1,4,3,3,4,1,3,5,1,3,1,5,4,5))

mod1 <- mirt(data, 1)
mod2 <- mirt(data, 2)
mod3 <- mirt(data, 3) #difficulty converging with reduced quadpts
anova(mod1,mod2)
anova(mod2, mod3) #negative AIC, 2 factors probably best

#with fixed guessing parameters
mod1g <- mirt(data, 1, guess = .1)
coef(mod1g)

###########
#graded rating scale example

#make some data
set.seed(1234)
a <- matrix(rep(1, 10))
d <- matrix(c(1,0.5,-.5,-1), 10, 4, byrow = TRUE)
c <- seq(-1, 1, length.out=10)
data <- simdata(a, d + c, 2000, itemtype = rep('graded',10))

#use much better start values to save iterations
sv <- mirt(data, 1, itemtype = 'grsm', pars = 'values')
sv[,'value'] <- c(as.vector(t(cbind(a,d,c))),0,1)

#also possible to edit start values with a GUI approach with
#   sv <- edit(sv)

mod1 <- mirt(data, 1)
mod2 <- mirt(data, 1, itemtype = 'grsm', pars = sv)
coef(mod2)
anova(mod2, mod1) #not sig, mod2 should be preferred

###########
# 2PL nominal response model example (Suh and Bolt, 2010)
data(SAT12)
SAT12[SAT12 == 8] <- NA
head(SAT12)

#correct answer key
key <- c(1,4,5,2,3,1,2,1,3,1,2,4,2,1,5,3,4,4,1,4,3,3,4,1,3,5,1,3,1,5,4,5)
scoredSAT12 <- key2binary(SAT12, key)
mod0 <- mirt(scoredSAT12, 1)

#for first 5 items use 2PLNRM and nominal
scoredSAT12[,1:5] <- as.matrix(SAT12[,1:5])
mod1 <- mirt(scoredSAT12, 1, c(rep('nominal',5),rep('2PL', 27)))
mod2 <- mirt(scoredSAT12, 1, c(rep('2PLNRM',5),rep('2PL', 27)), key=key)
coef(mod0)$Item.1
coef(mod1)$Item.1
coef(mod2)$Item.1
itemplot(mod0, 1)
itemplot(mod1, 1)
itemplot(mod2, 1)

#compare added information from distractors
Theta <- matrix(seq(-4,4,.01))
par(mfrow = c(2,3))
for(i in 1:5){
    info <- iteminfo(extract.item(mod0,i), Theta)
    info2 <- iteminfo(extract.item(mod2,i), Theta)
    plot(Theta, info2, type = 'l', main = paste('Information for item', i), ylab = 'Information')
    lines(Theta, info, col = 'red')
}

#test information
par(mfrow = c(1,1))
plot(Theta, testinfo(mod2, Theta), type = 'l', main = 'Test information', ylab = 'Information')
lines(Theta, testinfo(mod0, Theta), col = 'red')

Run the code above in your browser using DataLab