gdina: Function for Estimating the Generalized DINA (GDINA) Model

Description

This function implements the generalized DINA model (GDINA; de la Torre, 2011). See the paper for details about estimable cognitive diagnosis models. In addition, multiple group estimation is also possible using the gdina function. This function also allows for the estimation of a higher order GDINA model (de la Torre & Douglas, 2004).

Usage

gdina(data, q.matrix, conv.crit=0.0001, dev.crit=.1 ,  maxit=1000, 
    linkfct="identity", Mj=NULL, group=NULL , method="WLS" , 
    delta.designmatrix=NULL, delta.basispar.lower=NULL, 
    delta.basispar.upper=NULL, delta.basispar.init=NULL, 
    zeroprob.skillclasses=NULL, reduced.skillspace=TRUE , HOGDINA=-1,
    Z.skillspace=NULL, weights=rep(1, nrow(data)), rule="GDINA", progress=TRUE,
    progress.item=FALSE , ...)

Arguments

data

A required $N$ times $J$ data matrix containing the binary responses, 0 or 1, of $N$ respondents to $J$ test items, where 1 denotes a correct anwer and 0 an incorrect one. The $n$th row of the matrix represents the binary response pattern

q.matrix

A required binary $J$ times $K$ containing the attributes not required or required, 0 or 1, to master the items. The jth row of the matrix is a binary indicator vector indicating which attributes are not required (coded by 0) and which a

conv.crit

Convergence criterion for maximum absolut change in item parameters

dev.crit

Convergence criterion for maximum absolut change in deviance

maxit

Maximum number of iterations

linkfct

A string which indicates the link function for the GDINA model. Options are "identity" (identity link), "logit" (logit link) and "log" (log link). The default is the "identity" link. Not

A list of design matrices and labels for each item. The definition of Mj follows the defintion of $M_j$ in de la Torre (2011). Please study the value Mj of the function in default analysis. See Example 3.

group

A vector of group identifiers for multiple group estimation. Default is NULL (no multiple group estimation).

method

Estimation method for item parameters as described in de la Torre (2011). The default "WLS" weights probabilities attribute classes by a weighting matrix $W_j$ of expected frequencies, whereas the method "ULS"

delta.designmatrix

A design matrix for restrictions on delta. See Example 4.

delta.basispar.lower

Lower bounds for delta basis parameters.

delta.basispar.upper

Upper bounds for delta basis parameters.

delta.basispar.init

An optional vector of starting values for the basis parameters of delta. This argument only applies when using a designmatrix for delta, i.e. delta.designmatrix is not NULL.

zeroprob.skillclasses

an optional vector of integers which indicates which skill classes should have zero probability. Default is NULL (no skill classes with zero probability).

reduced.skillspace

Logical which indicates if the latent class skill space should be dimensionally reduced (see Xu & von Davier, 2008). Default is TRUE. The dimensional reduction is only well defined for more than three skills. The di

HOGDINA

Values of -1, 0 or 1 which indicate if a higher order GDINA model (see Details) should be estimated. The default value of -1 corresponds to the case that no higher order factor is assumed to exist. A value of 0 corresponds to independent attr

Z.skillspace

A user specified design matrix for the skill space reduction as described in Xu and von Davier (2008). See in the Examples section for applications. See Example 6.

weights

An optional vector of sample weights.

rule

A string or a vector of itemwise condensation rules. Allowed entries are GDINA, DINA, DINO, ACDM (additive cognitive diagnostic model) and RRUM (reduced reparametrized unifi

progress

Display progress on the Rconsole?

progress.item

Display itemwise progress

...

Further arguments to be passed

Value

An object of class gdina with following entries
coefItem parameters
deltaBasis item parameters
se.deltaStandard errors of basis item parameters
itemfit.rmseaThe RMSEA item fit index (see itemfit.rmsea).
mean.rmseaMean of RMSEA item fit indexes.
loglikeLog-likelihood
devianceDeviance
GNumber of groups
NSample size
AICAIC
BICBIC
CAICCAIC
NparsTotal number of parameters
NiparNumber of item parameters
NskillparNumber of parameters for skill class distribution
NskillclassesNumber of skill classes
varmat.deltaCovariance matrix of $\delta$ item parameters
varmat.plajXXX
posteriorIndividual posterior distribution
likeIndividual likelihood
dataOriginal data
q.matrixUsed $Q$ matrix
patternIndividual patterns, individual MLE and MAP classifications and their corresponding probabilities
attribute.pattProbabilities of skill classes
skill.pattMarginal skill probabilities
subj.patternIndividual subject pattern
attribute.patt.splittedSplitted attribute pattern
pjkArray of item response probabilities
MjDesign matrix $M_j$ in GDINA algorithm (see de la Torre, 2011)
AjDesign matrix $A_j$ in GDINA algorithm (see de la Torre, 2011)
delta.designmatrixDesignmatrix for item parameters
reduced.skillspaceA logical if skillspace reduction was performed
Z.skillspaceDesign matrix for skillspace reduction
betaParameters $\delta$ for skill class representation
covbetaStandard errors of $\delta$ parameters
model.typeXXX
iterNumber of iterations
rrum.paramsParameters in the parametrization of the reduced RUM model if rule="RRUM".
HOGDINAThe used value of HOGDINA
a.attrAttribute parameters $a_k$ in case of HOGDINA>=0
b.attrAttribute parameters $b_k$ in case of HOGDINA>=0
attr.rfAttribute response functions. This matrix contains all $a_k$ and $b_k$ parameters
...Further values

Details

The estimation is based on an EM algorithm as described in de la Torre (2011). Item parameters are contained in the delta vector which is a list where the $j$th entry corresponds to item parameters of the $j$th item. Assume that two skills $\alpha_1$ and $\alpha_2$ are required for mastering item $j$. Then the GDINA model can be written as $$g [ P( X_{nj} = 1 | \alpha_n ) ] = \delta_{j0} + \delta_{j1} \alpha_{n1} + \delta_{j2} \alpha_{n2} + \delta_{j12} \alpha_{n1} \alpha_{n2}$$ which is a two-way GDINA-model (the rule="GDINA2" specification) with a link function $g$. If the specification ACDM is chosen, then $\delta_{j12}=0$. The DINA model (rule="DINA") assumes $\delta_{j1} = \delta_{j2} = 0$. For the reduced RUM model (rule="RRUM"), the item response model is $$P(X_{nj}=1 | \alpha_n ) = \pi_i^\ast \cdot r_{i1}^{1-\alpha_{i1} } \cdot r_{i2}^{1-\alpha_{i2} }$$ From this equation, it is obvious, that this model is equivalent to an additive model (rule="ACDM") with a logarithmic link function (linkfct="log"). If a reduced skillspace (reduced.skillspace=TRUE) is employed, then the logarithm of probability distribution of the attributes is modelled as a log-linear model: $$\log P[ ( \alpha_{n1} , \alpha_{n2} , \ldots , \alpha_{nK} ) ] = \gamma_0 + \sum_k \gamma_k \alpha_{nk} + \sum_{k < l} \gamma_{kl} \alpha_{nk} \alpha_{nl}$$ If a higher order DINA model is assumed (HOGDINA=1), then a higher order factor $\theta_n$ for the attributes is assumed: $$P( \alpha_{nk} = 1 | \theta_n ) = \Phi ( a_k \theta_n + b_k )$$ For HOGDINA=0, all attributes $\alpha_{nk}$ are assumed to be independent of each other: $$P[ ( \alpha_{n1} , \alpha_{n2} , \ldots , \alpha_{nK} ) ] = \prod_k P( \alpha_{nk} )$$

References

de la Torre, J. & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333-353. de la Torre, J. (2011) The generalized {DINA} model framework. Psychometrika, 76, 179--199. Xu, X. & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS.

Examples

Run this code

###################################################################
# EXAMPLE 1: Simulated DINA data
#    different condensation rules 
###################################################################
data(sim.dina)

#***
# Model 1: estimation of the GDINA model (identity link)
mod1 <- gdina( data = sim.dina ,  q.matrix = sim.qmatrix , maxit=700)
summary(mod1)

#***
# Model 2: estimation of the DINA model with gdina function
mod2 <- gdina( data = sim.dina ,  q.matrix = sim.qmatrix , rule="DINA")
summary(mod2)

#***
# Model 3: compare results with din function
mod2b <- din( data = sim.dina ,  q.matrix = sim.qmatrix , rule="DINA")
summary(mod2b)
cbind( mod2$coef , mod2b$coef )

#***
# Model 4: DINA model with logit link
mod4 <- gdina( data = sim.dina ,  q.matrix = sim.qmatrix , maxit= 20 , 
                rule="DINA" , linkfct = "logit" )
summary(mod4)

#***
# Model 5: DINA model log link
mod5 <- gdina( data = sim.dina ,  q.matrix = sim.qmatrix , maxit=100 , 
                    rule="DINA" , linkfct = "log" )
summary(mod5)

#***
# Model 6: RRUM model
mod6 <- gdina( data = sim.dina, q.matrix = sim.qmatrix, maxit=100,  rule="RRUM")
summary(mod6)

#***
# Model 7: Higher order GDINA model
mod7 <- gdina( data = sim.dina, q.matrix = sim.qmatrix, maxit=100,  HOGDINA=1)
summary(mod7)

#***
# Model 8: Independence GDINA model
mod8 <- gdina( data = sim.dina, q.matrix = sim.qmatrix, maxit=100,  HOGDINA=0)
summary(mod8)

###################################################################
# EXAMPLE 2: Simulated DINO data
#    additive cognitive diagnosis model
#    with different link functions
###################################################################

#***
# Model 1: additive cognitive diagnosis model (ACDM; identity link)
mod1 <- gdina( data=sim.dino,  q.matrix=sim.qmatrix,  
                    rule="ACDM")
summary(mod1)

#***
# Model 2: ACDM logit link
mod2 <- gdina( data=sim.dino, q.matrix=sim.qmatrix,  
                    rule="ACDM", linkfct="logit" )
summary(mod2)

#***
# Model 3: ACDM log link
mod3 <- gdina( data=sim.dino,  q.matrix=sim.qmatrix,  
                rule="ACDM", linkfct="log" )
summary(mod3)

#***
# Model 4: Different condensation rules per item
I <- 9      # number of items
rule <- rep( "GDINA" , I )
rule[1] <- "DINO"   # 1st item: DINO model
rule[7] <- "GDINA2" # 7th item: GDINA model with first- 
                    #           and second-order interactions
rule[8] <- "ACDM"   # 8ht item: additive CDM
rule[9] <- "DINA"   # 9th item: DINA model
mod4 <- gdina( data=sim.dino, q.matrix=sim.qmatrix, rule=rule )
summary(mod4)

###################################################################
# EXAMPLE 3: Model with user-specified design matrices
###################################################################

# do a preliminary analysis and modify obtained design matrices
mod0 <- gdina( data = sim.dino ,  q.matrix = sim.qmatrix ,  maxit=1)

# extract default design matrices
Mj <- mod0$Mj
Mj.user <- Mj   # these user defined design matrices are modified.
#~~~
# For the second item, the following model should hold
# X1 ~ V2 + V2*V3
mj <- Mj[[2]][[1]]
mj.lab <- Mj[[2]][[2]]
mj <- mj[,-3]
mj.lab <- mj.lab[-3]
Mj.user[[2]] <- list( mj , mj.lab )
#    [[1]]
#        [,1] [,2] [,3]
#    [1,]    1    0    0
#    [2,]    1    1    0
#    [3,]    1    0    0
#    [4,]    1    1    1
#    [[2]]
#    [1] "0"   "1"   "1-2"    
#~~~
# For the eight item an equality constraint should hold
# X8 ~ a*V2 + a*V3 + V2*V3
mj <- Mj[[8]][[1]]
mj.lab <- Mj[[8]][[2]]
mj[,2] <- mj[,2] + mj[,3]
mj <- mj[,-3]
mj.lab <- c("0" , "1=2" , "1-2" )
Mj.user[[8]] <- list( mj , mj.lab )
Mj.user
mod <- gdina( data = sim.dino ,  q.matrix = sim.qmatrix ,
                    Mj = Mj.user ,  maxit=200 )
summary(mod)

###################################################################
# EXAMPLE 4: Design matrix for delta parameters
###################################################################

#~~~
# estimate an initial model
mod0 <- gdina( data = sim.dino ,  q.matrix = sim.qmatrix , 
            rule="ACDM" , maxit=1)
# extract coefficients
c0 <- mod0$coef
I <- 9  # number of items
delta.designmatrix <- matrix( 0 , nrow= nrow(c0) , ncol = nrow(c0) )
diag( delta.designmatrix) <- 1
# set intercept of item 1 and item 3 equal to each other
delta.designmatrix[ 7 , 1 ] <- 1 ; delta.designmatrix[,7] <- 0
# set loading of V1 of item1 and item 3 equal
delta.designmatrix[ 8 , 2 ] <- 1 ; delta.designmatrix[,8] <- 0
delta.designmatrix <- delta.designmatrix[ , -c(7:8) ]       
                # exclude original parameters with indices 7 and 8

#***
# Model 1: ACDM with designmatrix
mod1 <- gdina( data = sim.dino ,  q.matrix = sim.qmatrix ,  rule="ACDM" , 
            delta.designmatrix = delta.designmatrix )
summary(mod1)            

#***
# Model 2: Same model, but with logit link instead of identity link function
mod2 <- gdina( data = sim.dino ,  q.matrix = sim.qmatrix ,  rule="ACDM" , 
            delta.designmatrix = delta.designmatrix , 
            maxit=100 , linkfct = "logit")
summary(mod2)            

###################################################################
# SIMULATED EXAMPLE 5: Multiple group estimation
###################################################################

# simulate data
set.seed(9279)
N1 <- 200 ; N2 <- 100   # group sizes
I <- 10                 # number of items
q.matrix <- matrix(0,I,2)   # create Q matrix
q.matrix[1:7,1] <- 1 ; q.matrix[ 5:10,2] <- 1
# simulate first group
dat1 <- sim.din(N1, q.matrix=q.matrix , mean = c(0,0) )$dat
# simulate second group
dat2 <- sim.din(N2, q.matrix=q.matrix , mean = c(-.3 , -.7) )$dat
# merge data
dat <- rbind( dat1 , dat2 )
# group indicator 
group <- c( rep(1,N1) , rep(2,N2) )

# estimate GDINA model
mod <- gdina( data = dat , q.matrix = q.matrix ,  group= group)
summary(mod)

# estimate DINA model
mod2 <- gdina( data = dat , q.matrix = q.matrix , 
                group= group , rule="DINA")
summary(mod2)                       

###################################################################
# EXAMPLE 6: User specified reduced skill space
###################################################################

#   Some correlations between attributes should be set to zero.
q.matrix <- expand.grid( c(0,1) , c(0,1) , c(0,1) , c(0,1) )
colnames(q.matrix) <- colnames( paste("Attr" , 1:4 ,sep=""))
q.matrix <- q.matrix[ -1 , ]
Sigma <- matrix( .5 , nrow=4 , ncol=4 )
diag(Sigma) <- 1
Sigma[3,2] <- Sigma[2,3] <- 0 # set correlation of attribute A2 and A3 to zero
dat <- sim.din( N=1000 , q.matrix = q.matrix , Sigma = Sigma)$dat

#~~~ Step 1: initial estimation
mod1a <- gdina( data=dat , q.matrix = q.matrix , maxit= 1 , rule="DINA")
# estimate also "full" model
mod1 <- gdina( data=dat , q.matrix = q.matrix , rule="DINA")

#~~~ Step2: modify designmatrix for reduced skillspace
Z.skillspace <- data.frame( mod1a$Z.skillspace )
# set correlations of A2/A4 and A3/A4 to zero
vars <- c("A2_A3","A2_A4") 
for (vv in vars){ Z.skillspace[,vv] <- NULL }

#~~~ Step 3: estimate model with reduced skillspace
mod2 <- gdina( data=dat , q.matrix = q.matrix , 
        Z.skillspace=Z.skillspace , rule="DINA")

#~~~ eliminate all covariances
Z.skillspace <- data.frame( mod1$Z.skillspace )
colnames(Z.skillspace)
Z.skillspace <- Z.skillspace[ , - 
	grep( "_" , colnames(Z.skillspace ) , fixed=TRUE)]
colnames(Z.skillspace)

mod3 <- gdina( data=dat , q.matrix = q.matrix , 
        Z.skillspace=Z.skillspace , rule="DINA")
summary(mod1); summary(mod2); summary(mod3)

Run the code above in your browser using DataLab