GDINA: Calibrate dichotomous and polytomous responses

Description

GDINA calibrates the generalized deterministic inputs, noisy and gate (G-DINA; de la Torre, 2011) model for dichotomous responses, and its extension, the sequential G-DINA model (Ma, & de la Torre, 2016a) for ordinal and nominal responses. By setting appropriate constraints, the deterministic inputs, noisy and gate (DINA; de la Torre, 2009; Junker & Sijtsma, 2001) model, the deterministic inputs, noisy or gate (DINO; Templin & Henson, 2006) model, the reduced reparametrized unified model (R-RUM; Hartz, 2002), the additive CDM (A-CDM; de la Torre, 2011), and the linear logistic model (LLM; Maris, 1999) can also be calibrated. Note that the LLM is equivalent to the C-RUM (Hartz, 2002), a special case of the GDM (von Davier, 2008), and that the R-RUM is also known as a special case of the generalized NIDA model (de la Torre, 2011). Different models can be fitted to different items in a single test. The attributes can be either dichotomous or polytomous (Chen & de la Torre, 2013). Joint attribute distribution can be saturated, structured or higher-order (de la Torre & Douglas, 2004) when attributes are binary. Marginal maximum likelihood method with Expectation-Maximization (MMLE/EM) alogrithm is used for item parameter estimation. To compare two or more GDINA models, use method anova. To extract higher-order parameters, use method hoparm. To extract lower-order structural (item) parameters, use method itemparm. To calculate lower-order incidental (person) parameters use method personparm. To extract other components returned, use extract. To plot item/category response function, use plotIRF. To check whether monotonicity is violated, use monocheck. To conduct anaysis in graphical user interface, use startGDINA.

Usage

GDINA(dat, Q, model = "GDINA", sequential = FALSE, att.dist = "saturated",
  att.prior = NULL, att.str = FALSE, mono.constraint = FALSE,
  group = NULL, verbose = 1, catprob.parm = NULL, lower.p = 1e-04,
  upper.p = 0.9999, item.names = NULL, nstarts = 1, conv.crit = 0.001,
  lower.prior = -1, conv.type = "max.p.change", maxitr = 1000,
  digits = 4, diagnosis = 0, Mstep.warning = FALSE, optimizer = "all",
  randomseed = 123456, smallNcorrection = c(5e-04, 0.001),
  higher.order = list(model = "Rasch", method = "MMLE", nquad = 19, type =
  "testwise", slope.range = c(0.1, 5), intercept.range = c(-3, 3), slope.prior =
  c(0, 0.25), intercept.prior = c(0, 1)), optim.control = list())
# S3 method for GDINA
anova(object, ...)
# S3 method for GDINA
extract(object, what, SE.type = 2, ...)
# S3 method for GDINA
hoparm(object, withSE = FALSE, theta.est = FALSE,
  digits = 4, ...)
# S3 method for GDINA
itemparm(object, what = c("catprob", "itemprob", "LCprob",
  "gs", "delta", "rrum"), withSE = FALSE, SE.type = 2, digits = 4, ...)
# S3 method for GDINA
personparm(object, what = c("EAP", "MAP", "MLE", "mp"),
  digits = 4, ...)
# S3 method for GDINA
AIC(object, ...)
# S3 method for GDINA
BIC(object, ...)
# S3 method for GDINA
logLik(object, ...)
# S3 method for GDINA
deviance(object, ...)
# S3 method for GDINA
npar(object, ...)
# S3 method for GDINA
indlogLik(object, ...)
# S3 method for GDINA
indlogPost(object, ...)
# S3 method for GDINA
summary(object, ...)

Arguments

dat

A required $N \times J$ matrix or data.frame consisting of the responses of $N$ individuals to $J$ items. Missing values need to be coded as NA.

A required $J \times K$ item or category and attribute association matrix, wher $J$ represents the number of items or nonzero categories and $K$ represents the number of attributes. For binary attributes, entry 1 indicates that the attribute is measured by the item, and 0 otherwise. For polytomous attributes, non-zero elements indicate the level of attributes that are needed for an individual to answer the item correctly (see Chen, & de la Torre, 2013). Note that for polytomous items, the sequential G-DINA model is used and either restricted or unrestricted category-level Q-matrix is needed. In the category-level Q-matrix, the first column gives the item number, which must be numeric and match the number of column in the data. The second column indicates the category number. See Examples.

model

A vector for each item or nonzero category, or a scalar which will be used for all items or nonzero categories to specify the CDMs fitted. The possible options include "GDINA","DINA","DINO","ACDM","LLM", and "RRUM". It is also possible to specify CDMs using numbers. Particularly, 0,1,2,3,4 and 5 represents "GDINA","DINA","DINO","ACDM","LLM", and "RRUM", respectively.

sequential

logical; TRUE if the sequential model is fitted for polytomous responses.

att.dist

How is the joint attribute distribution estimated? It can be saturated, indicating that the proportion parameter for each permissible latent class is estimated separately; higher.order, indicating that a higher-order joint attribute distribution is assumed (higher-order model can be specified in higher.order argument); or fixed, indicating that the weights specified in att.prior argument are fixed in the estimation process. If att.prior is not specified, a uniform joint attribute distribution is employed initially. If different groups have different joint attribute distributions, specify att.dist as a character vector with the same number of elements as the number of groups.

att.prior

A vector of length $2^K$ for single group estimation, or a matrix of dimension $2^K\times$ no. of groups to specify attribute prior distribution for $2^K$ latent classes for all groups. Only applicable for dichotomous attributes. The sum of all elements does not have to be equal to 1; however, it will be transformed so that the sum is equal to 1 before model calibration. The label for each latent class can be obtained by calling attributepattern(K). See examples for more info.

att.str

logical; are attributes structured?

mono.constraint

logical; TRUE indicates that $P(\bm{\alpha}_1) <=P(\bm{\alpha}_2)$ if for all $k$, $\alpha_{1k} < \alpha_{2k}$. Can be a vector for each item or nonzero category or a scalar which will be used for all items to specify whether monotonicity constraint should be added.

group

a scalar indicating which column in dat is group indicator or a numerical vector indicating the group each individual belongs to. If it is a vector, its length must be equal to the number of individuals. Only at most two groups can be handled currently.

verbose

How to print calibration information after each EM iteration? Can be 0, 1 or 2, indicating to print no information, information for current iteration, or information for all iterations.

catprob.parm

A list of initial success probability parameters for each nonzero category.

lower.p

A vector for each item or nonzero category, or a scalar which will be used for all items or nonzero categories to specify the lower bound for success probabilities. The default is 1e-4 for all items.

upper.p

A vector for each item or nonzero category, or a scalar which will be used for all items or nonzero categories to specify the upper bound for success probabilities. The default is 0.9999 for all items.

item.names

A vector giving the item names. By default, items are named as "Item 1", "Item 2", etc.

nstarts

how many sets of starting values? The default is 1.

conv.crit

The convergence criterion for max absolute change in item parameters or deviance.

lower.prior

The lower bound for prior weights. Only applicable for nonstructured attributes. The default value is -1, which means the lower bound is $1/2^K/100$.

conv.type

How is the convergence criterion evaluated? Can be "max.p.change", indicating the maximum absolute change in success probabilities, or "dev.change", representing the absolute change in deviance.

maxitr

The maximum number of EM cycles allowed.

digits

How many decimal places in each number? The default is 4.

diagnosis

Run in diagnostic mode? If it is 1 or 2, some intermediate results obtained in each iteration can be extracted.

Mstep.warning

Logical; Should the warning message in Mstep, if any, be output immediately.

optimizer

A string indicating which optimizer should be used in M-step.

randomseed

Random seed for generating initial item parameters. The default random seed is 123456.

smallNcorrection

A numeric vector with two elements specifying the corrections applied when the expected number of individuals in some latent groups are too small. If the expected no. of examinees is less than the second element, the first element and two times the first element will be added to the numerator and denominator of the closed-form solution of probabilities of success. Only applicable for the G-DINA, DINA and DINO model estimation without monotonic constraints.

higher.order

A list specifying the higher-order joint attribute distribution with the following components: (1) model - a character indicating the IRT model for higher-order joint attribute distribution. Can be "2PL", "1PL" or "Rasch", representing two parameter logistic IRT model, one parameter logistic IRT model and Rasch model, respectively. For "1PL" model, a common slope parameter is estimated (see Details). "Rasch" is the default model when att.dist = "higher.order". (2) method - a character indicating the algorithm for the higher-order structural parameter estimation; Can be either "BL", "MMLE", which is the default, or "BMLE", which allows parameter priors to be imposed. (3) nquad - a scalar specifying the number of integral nodes. (4) type - a character specifying whether all higher-order structural parameters are estimated at the same time (i.e., type="testwise") or estimated attribute by attribute (i.e., type="attwise", only applicable when method="MMLE" or method="BMLE"). (5) slope.range - a vector of length two specifying the range of slope parameters. (6) intercept.range - a vector of length two specifying the range of intercept parameters. (7) slope.prior - a vector of length two specifying the mean and variance of log(slope) parameters, which are assumed normally distributed. (8) intercept.prior - a vector of length two specifying the mean and variance of intercept parameters, which are assumed normally distributed.

optim.control

Control options for optimizers in the M-step. Only available when optimizer is one specific optimization method, including BFGS from optim, slsqp, solnp and auglag. For the auglag method, optim.control specifies control.outer.

object

estimated GDINA object for various S3 methods

...

additional arguments

what

argument for various S3 methods

SE.type

type of standard errors.

withSE

argument for method itemparm; show standard errors or not?

theta.est

logical; Estimating higher-order person ability or not? The default is FALSE.

Value

GDINA returns an object of class GDINA. Methods for GDINA objects include extract for extracting various components, itemparm for extracting item parameters, personparm for calculating person parameters, summary for summary information. AIC, BIC,logLik, deviance and npar can also be used to calculate AIC, BIC, observed log-likelihood, deviance and number of parameters.

Methods (by generic)

anova: Model comparison using likelihood ratio test
extract: extract various elements of GDINA estimates
hoparm: extract higher-order parameters
itemparm: extract various item parameters
personparm: calculate person attribute patterns and higher-order ability
AIC: calculate AIC
BIC: calculate BIC
logLik: calculate log-likelihood
deviance: calculate deviance
npar: calculate the number of parameters
indlogLik: extract log-likelihood for each individual
indlogPost: extract log posterior for each individual
summary: print summary information

The G-DINA model

The generalized DINA model (G-DINA; de la Torre, 2011) is an extension of the DINA model. Unlike the DINA model, which collaspes all latent classes into two latent groups for each item, if item $j$ requires $K_j^*$ attributes, the G-DINA model collapses $2^K$ latent classes into $2^{K_j^*}$ latent groups with unique success probabilities on item $j$, where $K_j^*=\sum_{k=1}^{K}q_{jk}$. Let $\bm{\alpha}_{lj}^*$ be the reduced attribute pattern consisting of the columns of the attributes required by item $j$, where $l=1,\ldots,2^{K_j^*}$. For example, if only the first and the last attributes are required, $\bm{\alpha}_{lj}^*=(\alpha_{l1},\alpha_{lK})$. For notational convenience, the first $K_j^*$ attributes can be assumed to be the required attributes for item $j$ as in de la Torre (2011). The probability of success $P(X_{j}=1|\bm{\alpha}_{lj}^*)$ is denoted by $P(\bm{\alpha}_{lj}^*)$. To model this probability of success, different link functions as in the generalized linear models are used in the G-DINA model. The item response function of the G-DINA model using the identity link can be written as $$P(\bm{\alpha}_{lj}^*)=\delta_{j0}+\sum_{k=1}^{K_j^*}\delta_{jk}\alpha_{lk}+ \sum_{k'=k+1}^{K_j^*}\sum_{k=1}^{K_j^*-1}\delta_{jkk'}\alpha_{lk}\alpha_{lk'}+\cdots+ \delta_{j12{\cdots}K_j^*}\prod_{k=1}^{K_j^*}\alpha_{lk}, $$ where $\delta_{j0}$ is the intercept for item $j$, $\delta_{jk}$ is the main effect due to $\alpha_{lk}$, $\delta_{jkk'}$ is the interaction effect due to $\alpha_{lk}$ and $\alpha_{lk'}$, $\delta_{j12{\ldots}K_j^*}$ is the interaction effect due to $\alpha_{l1}, \cdots,\alpha_{lK_j^*}$. The log and logit links can also be employed.

Other CDMs as special cases

Several widely used CDMs can be obtained by setting appropriate constraints to the G-DINA model. This section introduces the parameterization of different CDMs within the G-DINA model framework very breifly. Readers interested in this please refer to de la Torre(2011) for details.

DINA model: In DINA model, each item has two item parameters - guessing ($g$) and slip ($s$). In traditional parameterization of the DINA model, a latent variable $\eta$ for person $i$ and item $j$ is defined as $$\eta_{ij}=\prod_{k=1}^K\alpha_{ik}^{q_{jk}}$$ Briefly speaking, if individual $i$ master all attributes required by item $j$, $\eta_{ij}=1$; otherwise, $\eta_{ij}=0$. Item response function of the DINA model can be written by $$P(X_{ij}=1|\eta_{ij})=(1-s_j)^{\eta_{ij}}g_j^{1-\eta_{ij}}$$ To obtain the DINA model from the G-DINA model, all terms in identity link G-DINA model except $\delta_0$ and $\delta_{12{\ldots}K_j^*}$ need to be fixed to zero, that is, $$ P(\bm{\alpha}_{lj}^*)=\delta_{j0}+\delta_{j12{\cdots}K_j^*}\prod_{k=1}^{K_j^*}\alpha_{lk}$$ In this parameterization, $\delta_{j0}=g_j$ and $\delta_{j0}+\delta_{j12{\cdots}K_j^*}=1-s_j$.
DINO model: The DINO model can be given by $$P(\bm{\alpha}_{lj}^*)=\delta_{j0}+\delta_{j1}I(\bm{\alpha}_{lj}^*\neq \bm{0})$$ where $I(\cdot)$ is an indicator variable. The DINO model is also a constrained identity link G-DINA model. As shown by de la Torre (2011), the appropriate constraint is $$\delta_{jk}=-\delta_{jk^{'}k^{''}}=\cdots=(-1)^{K_j^*+1}\delta_{j12{\cdots}K_j^*},$$ for $k=1,\cdots,K_j^*, k^{'}=1,\cdots,K_j^*-1$, and $k^{''}>k^{'},\cdots,K_j^*$.
Additive models with different link functions: The A-CDM, LLM and R-RUM can be obtained by setting all interactions to be zero in identity, logit and log link G-DINA model, respectively. Specifically, the A-CDM can be formulated as $$P(\bm{\alpha}_{lj}^*)=\delta_{j0}+\sum_{k=1}^{K_j^*}\delta_{jk}\alpha_{lk}.$$ The item response function for LLM can be given by $$ logit[P(\bm{\alpha}_{lj}^*)]=\delta_{j0}+\sum_{k=1}^{K_j^*}\delta_{jk}\alpha_{lk},$$ and lastly, the RRUM, can be written as $$log[P(\bm{\alpha}_{lj}^*)]=\delta_{j0}+\sum_{k=1}^{K_j^*}\delta_{jk}\alpha_{lk}.$$ It should be noted that the LLM is equivalent to the compensatory RUM, which is subsumed by the GDM, and that the RRUM is a special case of the generalized noisy inputs, deterministic ``And" gate model (G-NIDA).

Model Estimation

The MMLE/EM algorithm is implemented in this package. For G-DINA, DINA and DINO models, closed-form solutions can be found. Specifically, for the G-DINA model, $$P(\alpha_{lj}^*)=R_{jl}/N_{jl}$$ where $R_{jl}$ is the expected number of examinees with attribute pattern $\alpha_{lj}^*$ answering item $j$ correctly and $N_{jl}$ is the expected number of examinees with attribute pattern $\alpha_{lj}^*$. For DINA or DINO model, $R_{jl}$ and $N_{jl}$ are collapsed for latent classes having the same probability of success. See de la Torre (2009) and de la Torre (2011) for details. For ACDM, LLM and RRUM, closed-form solutions do not exist, and therefore some general optimization techniques are adopted in M-step. See Ma, Iaconangelo and de la Torre (2016) for details. The selection of optimization techniques mainly depends on whether some specific constraints need to be added. It should be noted that adding monotone constraints to the G-DINA model may dramatically increase running time especially when the number of required attributes are large. The sequential G-DINA model can be estimated as in Ma & de la Torre (2016a) using optimization techniques. However, Ma & de la Torre (2016b) found that the sequential G-DINA, DINA and DINO models can be estimated using close-form solutions, which can be implemented in a straightforward manner using the observation-coding (Tutz, 1997). For estimating the joint attribute distribution, by default, an empirical Bayes method (saturated; Carlin & Louis, 2000) is adopted, which is referred to as the saturated attribute structure. Specifically, the prior distribution of joint attributes is uniform at the beginning, and then updated after each EM iteration based on the posterior distribution. The joint attribute distribution can also be modeled using some higher-order IRT models, which is referred to as higher-order attribute structure. The higher-order attribute structure was originally proposed by de la Torre and Douglas (2004) for the DINA model. It has been extended in this package for the G-DINA model, DINA, DINO, A-CDM, LLM and RRUM. Particularly, three IRT models are available for the higher-order attribute structure: Rasch model (Rasch), one parameter logistic model (1PL) and two parameter logistic model (2PL). For the Rasch model, the probability of mastering attribute $k$ for individual $i$ is defined as $$P(\alpha_k=1|\theta_i,\lambda_{0k})=\frac{exp(\theta_i+\lambda_{0k})}{1+exp(\theta_i+\lambda_{0k})}$$ For the 1PL model, the probability of mastering attribute $k$ for individual $i$ is defined as $$P(\alpha_k=1|\theta_i,\lambda_{0k},\lambda_{1})=\frac{exp(\lambda_{1}\theta_i+\lambda_{0k})}{1+exp(\lambda_{1}\theta_i+\lambda_{0k})}$$ For the 2PL model, the probability of mastering attribute $k$ for individual $i$ is defined as $$P(\alpha_k=1|\theta_i,\lambda_{0k},\lambda_{1k})=\frac{exp(\lambda_{1k}\theta_i+\lambda_{0k})}{1+exp(\lambda_{1k}\theta_i+\lambda_{0k})}$$ where $\theta_i$ is the ability of examinee $i$. $\lambda_{0k}$ and $\lambda_{1k}$ are the intercept and slope parameters for attribute $k$, respectively. In the Rasch model, $\lambda_{1k}=1 \forall k$; whereas in the 1PL model, a common slope parameter $\lambda_{1}$ is estimated. The probability of joint attributes can be written as $$P(\strong{\alpha}|\theta_i,\strong{\lambda})=\prod_k P(\alpha_k|\theta_i,\strong{\lambda})$$.

The Number of Parameters

For dichotomous response models: Assume a test measures $K$ attributes and item $j$ requires $K_j^*$ attributes: The DINA and DINO model has 2 item parameters for each item; if item $j$ is ACDM, LLM or RRUM, it has $K_j^*+1$ item parameters; if it is G-DINA model, it has $2^{K_j^*}$ item parameters. Apart from item parameters, the parameters involved in the estimation of joint attribute distribution need to be estimated as well. When using the saturated attribute structure, there are $2^K-1$ parameters for joint attribute distribution estimation; when using a higher-order attribute structure, there are $K$, $K+1$, and $2\times K$ parameters for the Rasch model, 1PL model and 2PL model, respectively. For polytomous response data using the sequential G-DINA model, the number of item parameters are counted at category level.

References

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459. Bock, R. D., & Lieberman, M. (1970). Fitting a response model forn dichotomously scored items. Psychometrika, 35, 179-197. Carlin, B. P., & Louis, T. A. (2000). Bayes and empirical bayes methods for data analysis. New York, NY: Chapman & Hall de la Torre, J. (2009). DINA Model and Parameter Estimation: A Didactic. Journal of Educational and Behavioral Statistics, 34, 115-130. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179-199. de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333-353. de la Torre, J., & Lee, Y. S. (2013). Evaluating the wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50, 355-373. Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301-321. Hartz, S. M. (2002). A bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign. Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258-272. Ma, W., & de la Torre, J. (2016a). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology. 69, 253-275. Ma, W., & de la Torre, J. (2016b, July). A Q-matrix validation method for the sequential G-DINA model. Paper presented at the 80th International Meeting of the Psychometric Society, Asheville, NC. Ma, W., Iaconangelo, C., & de la Torre, J. (2016). Model similarity, model selection and attribute classification. Applied Psychological Measurement, 40, 200-217. Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187-212. Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345-354. Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287-305. Tutz, G. (1997). Sequential models for ordered responses. In W.J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory p. 139-152). New York, NY: Springer.

Examples

Run this code

## Not run: ------------------------------------
# ####################################
# #        Example 1.                #
# #     GDINA, DINA, DINO            #
# #    ACDM, LLM and RRUM            #
# # estimation and comparison        #
# #                                  #
# ####################################
# 
# dat <- sim10GDINA$simdat
# Q <- sim10GDINA$simQ
# 
# #--------GDINA model --------#
# 
# mod1 <- GDINA(dat = dat, Q = Q, model = "GDINA")
# mod1
# # summary information
# summary(mod1)
# 
# AIC(mod1) #AIC
# BIC(mod1) #BIC
# logLik(mod1) #log-likelihood value
# deviance(mod1) # deviance: -2 log-likelihood
# npar(mod1) # number of parameters
# 
# 
# head(indlogLik(mod1)) # individual log-likelihood
# head(indlogPost(mod1)) # individual log-posterior
# 
# # item parameters
# # see ?itemparm
# itemparm(mod1) # item probabilities of success for each latent group
# itemparm(mod1, withSE = TRUE) # item probabilities of success & standard errors
# itemparm(mod1, what = "delta") # delta parameters
# itemparm(mod1, what = "delta",withSE=TRUE) # delta parameters
# itemparm(mod1, what = "gs") # guessing and slip parameters
# itemparm(mod1, what = "gs",withSE = TRUE) # guessing and slip parameters & standard errors
# 
# # person parameters
# # see ?personparm
# personparm(mod1) # EAP estimates of attribute profiles
# personparm(mod1, what = "MAP") # MAP estimates of attribute profiles
# personparm(mod1, what = "MLE") # MLE estimates of attribute profiles
# 
# #plot item response functions for item 10
# plotIRF(mod1,item = 10)
# plotIRF(mod1,item = 10,errorbar = TRUE) # with error bars
# plotIRF(mod1,item = c(6,10))
# 
# # Use extract function to extract more components
# # See ?extract
# 
# # ------- DINA model --------#
# dat <- sim10GDINA$simdat
# Q <- sim10GDINA$simQ
# mod2 <- GDINA(dat = dat, Q = Q, model = "DINA")
# mod2
# itemparm(mod2, what = "gs") # guess and slip parameters
# itemparm(mod2, what = "gs",withSE = TRUE) # guess and slip parameters and standard errors
# 
# # Model comparison at the test level via likelihood ratio test
# anova(mod1,mod2)
# 
# # -------- DINO model -------#
# dat <- sim10GDINA$simdat
# Q <- sim10GDINA$simQ
# mod3 <- GDINA(dat = dat, Q = Q, model = "DINO")
# #slip and guessing
# itemparm(mod3, what = "gs") # guess and slip parameters
# itemparm(mod3, what = "gs",withSE = TRUE) # guess and slip parameters + standard errors
# 
# # Model comparison at test level via likelihood ratio test
# anova(mod1,mod2,mod3)
# 
# # --------- ACDM model -------#
# dat <- sim10GDINA$simdat
# Q <- sim10GDINA$simQ
# mod4 <- GDINA(dat = dat, Q = Q, model = "ACDM")
# mod4
# # --------- LLM model -------#
# dat <- sim10GDINA$simdat
# Q <- sim10GDINA$simQ
# mod4b <- GDINA(dat = dat, Q = Q, model = "LLM")
# mod4b
# # --------- RRUM model -------#
# dat <- sim10GDINA$simdat
# Q <- sim10GDINA$simQ
# mod4c <- GDINA(dat = dat, Q = Q, model = "RRUM")
# mod4c
# 
# # --- Different CDMs for different items --- #
# 
# dat <- sim10GDINA$simdat
# Q <- sim10GDINA$simQ
# models <- c(rep("GDINA",3),"LLM","DINA","DINO","ACDM","RRUM","LLM","RRUM")
# mod5 <- GDINA(dat = dat, Q = Q, model = models)
# anova(mod1,mod2,mod3,mod4,mod4b,mod4c,mod5)
# 
# 
# ####################################
# #        Example 2.                #
# #        Model estimations         #
# # With monotonocity constraints    #
# ####################################
# dat <- sim10GDINA$simdat
# Q <- sim10GDINA$simQ
# # for item 10 only
# mod11 <- GDINA(dat = dat, Q = Q, model = "GDINA",mono.constraint = c(rep(FALSE,9),TRUE))
# mod11
# mod11a <- GDINA(dat = dat, Q = Q, model = "DINA",mono.constraint = TRUE)
# mod11a
# mod11b <- GDINA(dat = dat, Q = Q, model = "ACDM",mono.constraint = TRUE)
# mod11b
# mod11c <- GDINA(dat = dat, Q = Q, model = "LLM",mono.constraint = TRUE)
# mod11c
# mod11d <- GDINA(dat = dat, Q = Q, model = "RRUM",mono.constraint = TRUE)
# mod11d
# itemparm(mod11d,"delta")
# itemparm(mod11d,"rrum")
# 
# ####################################
# #           Example 3.             #
# #        Model estimations         #
# # With Higher-order att structure  #
# ####################################
# 
# dat <- sim10GDINA$simdat
# Q <- sim10GDINA$simQ
# # --- Higher order G-DINA model ---#
# mod12 <- GDINA(dat = dat, Q = Q, model = "DINA",
#                att.dist="higher.order",higher.order=list(method="MMLE",nquad=31))
# hoest=hoparm(mod12) # extract higher-order parameters
# hoest$theta # ability
# hoest$lambda # structural parameters
# # --- Higher order DINA model ---#
# mod22 <- GDINA(dat = dat, Q = Q, model = "DINA",
#                att.dist="higher.order",higher.order=list(method="BMLE"))
# # --- Higher order DINO model ---#
# mod23 <- GDINA(dat = dat, Q = Q, model = "DINO",att.dist="higher.order")
# # --- Higher order ACDM model ---#
# mod24 <- GDINA(dat = dat, Q = Q, model = "ACDM",
#                att.dist="higher.order",higher.order=list(model="1PL"))
# # --- Higher order LLM model ---#
# mod25 <- GDINA(dat = dat, Q = Q, model = "LLM",att.dist="higher.order")
# # --- Higher order RRUM model ---#
# mod26 <- GDINA(dat = dat, Q = Q, model = "RRUM",att.dist="higher.order")
# 
# ####################################
# #          Example 4.              #
# #        Model estimations         #
# # With user-specified att structure#
# ####################################
# 
# # --- User-specified attribute priors ----#
# # prior distribution is fixed during calibration
# # Assume each of 000,100,010 and 001 has probability of 0.1
# # and each of 110, 101,011 and 111 has probability of 0.15
# # Note that the sum is equal to 1
# #
# prior <- c(0.1,0.1,0.1,0.1,0.15,0.15,0.15,0.15)
# # fit GDINA model  with fixed prior dist.
# dat <- sim10GDINA$simdat
# Q <- sim10GDINA$simQ
# modp1 <- GDINA(dat = dat, Q = Q, att.prior = prior, att.dist = "fixed")
# # See the posterior weights
# extract(modp1,what = "posterior.prob")
# extract(modp1,what = "att.prior")
# # ----Linear structure of attributes -----#
# # Assuming A1 -> A2 -> A3
#   Q <- matrix(c(1,0,0,
#                 1,0,0,
#                 1,1,0,
#                 1,1,0,
#                 1,1,1,
#                 1,1,1,
#                 1,0,0,
#                 1,0,0,
#                 1,1,0,
#                 1,1,0,
#                 1,1,1,
#                 1,1,1),ncol=3,byrow=TRUE)
#  # item parameters for DINA model (guessing and slip)
#  gs <- matrix(rep(0.1,24),ncol=2)
#  N <- 5000
#  # attribute simulation
#  att <- rbind(matrix(0,nrow=500,ncol=3),
#               matrix(rep(c(1,0,0),1000),ncol=3,byrow=TRUE),
#               matrix(rep(c(1,1,0),1000),ncol=3,byrow=TRUE),
#               matrix(rep(c(1,1,1),2500),ncol=3,byrow=TRUE))
#  # data simulation
#  simD <- simGDINA(N,Q,gs.parm = gs, model = "DINA",attribute = att)
#  dat <- simD$dat
#  # setting structure: A1 -> A2 -> A3
#  # note: latent classes with prior 0 are assumed impossible
#  prior <- c(0.1,0.2,0,0,0.2,0,0,0.5)
#  out <- GDINA(dat, Q, att.prior = prior,att.str = TRUE, att.dist = "fixed", model = "DINA")
#  # check posterior dist.
#  extract(out,what = "posterior.prob")
#  extract(out,what = "att.prior")
# 
#  out2 <- GDINA(dat, Q, att.prior = prior,att.str = TRUE, att.dist = "saturated",model = "DINA")
#  # check posterior dist.
#  extract(out2,what = "posterior.prob")
#  extract(out2,what = "att.prior")
# 
# ####################################
# #          Example 5.              #
# #        Model estimations         #
# # With user-specified att structure#
# ####################################
# 
# # --- User-specified attribute structure ----#
# Q <- sim30GDINA$simQ
# K <- ncol(Q)
# # divergent structure A1->A2->A3;A1->A4->A5;A1->A4->A6
# diverg <- list(c(1,2),
#                c(2,3),
#                c(1,4),
#                c(4,5))
# struc <- att.structure(diverg,K)
# 
# # data simulation
# N <- 1000
# true.lc <- sample(c(1:2^K),N,replace=TRUE,prob=struc$att.prob)
# table(true.lc) #check the sample
# true.att <- attributepattern(K)[true.lc,]
#  gs <- matrix(rep(0.1,2*nrow(Q)),ncol=2)
#  # data simulation
#  simD <- simGDINA(N,Q,gs.parm = gs,
#                    model = "DINA",attribute = true.att)
#  dat <- extract(simD,"dat")
# 
# modp1 <- GDINA(dat = dat, Q = Q, att.prior = struc$att.prob, att.str = TRUE, att.dist = "saturated")
# modp1
# # Note that fixed priors were used for all iterations
# extract(modp1,what = "att.prior")
# # Posterior weights were slightly different
# extract(modp1,what = "posterior.prob")
# modp2 <- GDINA(dat = dat, Q = Q, att.prior = struc$att.prob, att.str = TRUE, att.dist = "fixed")
# modp2
# extract(modp2,what = "att.prior")
# extract(modp2,what = "posterior.prob")
# 
# 
# ####################################
# #           Example 6.             #
# #        Model estimations         #
# # With user-specified initial pars #
# ####################################
# 
#  # check initials to see the format for initial item parameters
#  initials <- sim10GDINA$simItempar
#  dat <- sim10GDINA$simdat
#  Q <- sim10GDINA$simQ
#  mod.ini <- GDINA(dat,Q,catprob.parm = initials)
#  extract(mod.ini,"initial.catprob")
# ####################################
# #           Example 7.             #
# #        Model estimation          #
# #          Without M-step          #
# ####################################
# 
#  # -----------Fix User-specified item parameters
#  # Item parameters are not estimated
#  # Only person attributes are estimated
#  # attribute prior distribution matters if interested in the marginalized likelihood
#  dat <- frac20$dat
#  Q <- frac20$Q
#  mod.initial <- GDINA(dat,Q,maxit=20) # estimation- only 10 iterations for illustration purposes
#  par <- itemparm(mod.initial,digits=8)
#  weights <- extract(mod.initial,"posterior.prob",digits=8) #posterior weights
#  # use the weights as the priors
#  mod.fix <- GDINA(dat,Q,catprob.parm = par,att.prior=c(weights),maxitr=0) # re-estimation
#  anova(mod.initial,mod.fix) # very similar - good approximation most of time
#  # prior used for the likelihood calculation for the last step
#  priors <- extract(mod.initial,"att.prior")
#  # use the priors as the priors
#  mod.fix2 <- GDINA(dat,Q,catprob.parm = par,att.prior=priors,maxitr=0) # re-estimation
#  anova(mod.initial,mod.fix2) # identical results
# 
# ####################################
# #           Example 8.             #
# #        polytomous attribute      #
# #          model estimation        #
# #    see Chen, de la Torre 2013    #
# ####################################
# 
# 
# # --- polytomous attribute G-DINA model --- #
# dat <- sim30pGDINA$simdat
# Q <- sim30pGDINA$simQ
# #polytomous G-DINA model
# pout <- GDINA(dat,Q)
# 
# # ----- polymous DINA model --------#
# pout2 <- GDINA(dat,Q,model="DINA")
# anova(pout,pout2)
# 
# ####################################
# #           Example 9.             #
# #        Sequential G-DINA model   #
# #    see Ma, & de la Torre 2016    #
# ####################################
# 
# # --- polytomous attribute G-DINA model --- #
# dat <- sim20seqGDINA$simdat
# Q <- sim20seqGDINA$simQ
# Q
# #    Item Cat A1 A2 A3 A4 A5
# #       1   1  1  0  0  0  0
# #       1   2  0  1  0  0  0
# #       2   1  0  0  1  0  0
# #       2   2  0  0  0  1  0
# #       3   1  0  0  0  0  1
# #       3   2  1  0  0  0  0
# #       4   1  0  0  0  0  1
# #       ...
# 
# #sequential G-DINA model
# sGDINA <- GDINA(dat,Q,sequential = TRUE)
# sDINA <- GDINA(dat,Q,sequential = TRUE,model = "DINA")
# anova(sGDINA,sDINA)
# itemparm(sDINA) # processing function
# itemparm(sDINA,"itemprob") # success probabilities for each item
# itemparm(sDINA,"LCprob") # success probabilities for each category for all latent classes
# 
# ####################################
# #           Example 10.            #
# #    Multiple-Group G-DINA model   #
# ####################################
# Q <- sim10GDINA$simQ
# 
# # parameter simulation
# # Group 1 - female
# N1 <- 2000
# gs1 <- matrix(rep(0.1,2*nrow(Q)),ncol=2)
# # Group 2 - male
# N2 <- 2000
# gs2 <- matrix(rep(0.2,2*nrow(Q)),ncol=2)
# 
# # data simulation for each group
# sim1 <- simGDINA(N1,Q,gs.parm = gs1,model = "DINA")
# sim2 <- simGDINA(N2,Q,gs.parm = gs2,model = "DINO")
# 
# # combine data
# # see ?bdiagMatrix
# dat <- bdiagMatrix(list(extract(sim1,"dat"),extract(sim2,"dat")),fill=NA)
# Q <- rbind(Q,Q)
# gr <- rep(c("female","male"),each=2000)
# # Fit G-DINA model
# mg.est <- GDINA(dat = dat,Q = Q,group = gr)
# summary(mg.est)
# extract(mg.est,"posterior.prob")
# 
# # Fit G-DINA model with different joint attribute dist.
# mg.est2 <- GDINA(dat = dat,Q = Q,group = gr,
# att.dist = c("saturated","fixed"))
# summary(mg.est2)
# 
## ---------------------------------------------

Run the code above in your browser using DataLab