get.R2: Calculate McFadden Pseudo-R2

Description

The function is able to calculate the McFadden pseudo-$R^{2}$ ($R^{2}$) for all items after fitting CDM or directly.

Usage

get.R2(
  Y = NULL,
  Q = NULL,
  att.str = NULL,
  CDM.obj = NULL,
  mono.constraint = FALSE,
  model = "GDINA"
)

Value

An object of class matrix, which consisted of $R^{2}$ for each item and each possible attribute mastery pattern.

Arguments

Y: A required $N$ × $I$ matrix or data.frame consisting of the responses of N individuals to $N$ × $I$ items. Missing values need to be coded as NA.
Q: A required binary $I$ × $K$ matrix containing the attributes not required or required master the items. The ith row of the matrix is a binary indicator vector indicating which attributes are not required (coded by 0) and which attributes are required (coded by 1) to master item $i$.
att.str: Specify attribute structures. NULL, by default, means there is no structure. Attribute structure needs be specified as a list - which will be internally handled by att.structure function. See examples. It can also be a matrix giving all permissible attribute profiles.
CDM.obj: An object of class CDM.obj. Can be NULL, but when it is not NULL, it enables rapid verification of the Q-matrix without the need for parameter estimation. @seealso CDM.
mono.constraint: Logical indicating whether monotonicity constraints should be fulfilled in estimation. Default = FALSE.
model: Type of model to fit; can be "GDINA", "LCDM", "DINA", "DINO", "ACDM", "LLM", or "rRUM". Default = "GDINA".

Author

Haijiang Qin <Haijiang133@outlook.com>

Details

The McFadden pseudo-$R^{2}$ (McFadden, 1974) serves as a definitive model-fit index, quantifying the proportion of variance explained by the observed responses. Comparable to the squared multiple-correlation coefficient in linear statistical models, this coefficient of determination finds its application in logistic regression models. Specifically, in the context of the CDM, where probabilities of accurate item responses are predicted for each examinee, the McFadden pseudo-$R^{2}$ provides a metric to assess the alignment between these predictions and the actual responses observed. Its computation is straightforward, following the formula: $$ R_{i}^{2} = 1 - \frac{\log(L_{im})}{\log(L_{i0})} $$ where $\log(L_{im})$ is the log-likelihood of the model, and $\log(L_{i0})$ is the log-likelihood of the null model. If there were $N$ examinees taking a test comprising $I$ items, then $\log(L_{im})$ would be computed as: $$ \log(L_{im}) = \sum_{p}^{N} \log \sum_{l=1}^{2^{K^\ast}} \pi(\boldsymbol{\alpha}_{l}^{\ast} | \boldsymbol{X}_{p}) P_{i}(\boldsymbol{\alpha}_{l}^{\ast})^{X_{pi}} \left[ 1-P_{i}(\boldsymbol{\alpha}_{l}^{\ast}) \right] ^{(1-X_{pi})} $$ where $\pi(\boldsymbol{\alpha}_{l}^{\ast} | \boldsymbol{X}_{p})$ is the posterior probability of examinee $p$ with attribute mastery pattern $\boldsymbol{\alpha}_{l}^{\ast}$ when their response vector is $\boldsymbol{X}_{p}$, and $X_{pi}$ is examinee $p$'s response to item $i$. Let $\bar{X}_{i}$ be the average probability of correctly responding to item $i$ across all $N$ examinees; then $\log(L_{i0})$ could be computed as: $$ \log(L_{i0}) = \sum_{p}^{N} \log {\bar{X}_{i}}^{X_{pi}} {(1-\bar{X}_{i})}^{(1-X_{pi})} $$

References

McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in economics (pp.105–142). Academic Press.

Najera, P., Sorrel, M. A., de la Torre, J., & Abad, F. J. (2021). Balancing ft and parsimony to improve Q-matrix validation. British Journal of Mathematical and Statistical Psychology, 74, 110–130. DOI: 10.1111/bmsp.12228.

Qin, H., & Guo, L. (2023). Using machine learning to improve Q-matrix validation. Behavior Research Methods. DOI: 10.3758/s13428-023-02126-0.

Examples

Run this code

# \donttest{
library(Qval)

set.seed(123)

## generate Q-matrix and data
K <- 3
I <- 20
Q <- sim.Q(K, I)
IQ <- list(
  P0 = runif(I, 0.0, 0.2),
  P1 = runif(I, 0.8, 1.0)
)
data <- sim.data(Q = Q, N = 500, IQ = IQ, model = "GDINA", distribute = "horder")

## calculate R2 directly
R2 <-get.R2(Y = data$dat, Q = Q)
print(R2)

## calculate R2 after fitting CDM
CDM.obj <- CDM(data$dat, Q, model="GDINA")
R2 <-get.R2(CDM.obj = CDM.obj)
print(R2)
# }

Run the code above in your browser using DataLab