crdif: Residual-Based DIF Detection Framework Using Categorical Residuals (RDIF-CR)

Description

This function computes three statistics of the residual-based DIF detection framework using categorical residuals (RDIF-CR)—\(RDIF_{R}-CR\), \(RDIF_{S}-CR\), and \(RDIF_{RS}-CR\)—for detecting global differential item functioning (DIF), particularly in polytomously scored items. The RDIF-CR framework evaluates DIF by comparing categorical residual vectors, which are calculated as the difference between a one-hot encoded response vector (with 1 for the selected category and 0 for all others) and the IRT model–predicted probability vector across all score categories. This approach enables fine-grained detection of global DIF patterns at the category level.

Usage

crdif(x, ...)
# S3 method for default
crdif(
  x,
  data,
  score = NULL,
  group,
  focal.name,
  item.skip = NULL,
  D = 1,
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  purify.by = c("crdifrs", "crdifr", "crdifs"),
  max.iter = 10,
  min.resp = NULL,
  method = "ML",
  range = c(-5, 5),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)
# S3 method for est_irt
crdif(
  x,
  score = NULL,
  group,
  focal.name,
  item.skip = NULL,
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  purify.by = c("crdifrs", "crdifr", "crdifs"),
  max.iter = 10,
  min.resp = NULL,
  method = "ML",
  range = c(-5, 5),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)
# S3 method for est_item
crdif(
  x,
  group,
  focal.name,
  item.skip = NULL,
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  purify.by = c("crdifrs", "crdifr", "crdifs"),
  max.iter = 10,
  min.resp = NULL,
  method = "ML",
  range = c(-5, 5),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)

Value

This function returns a list containing four main components:

no_purify

A list of sub-objects containing the results of DIF analysis without applying a purification procedure. The sub-objects include:

dif_stat: A data frame summarizing the RDIF-CR analysis results for all items. Columns include item ID, \(RDIF_{R}-CR\), degrees of freedom, \(RDIF_{S}-CR\), degrees of freedom, \(RDIF_{RS}-CR\), degrees of freedom, associated p-values, and sample sizes for the reference and focal groups.

moments

A list containing the first and second moments (means and covariance matrices) of the RDIF-CR statistics. The elements include: mu.crdifr, mu.crdifs, mu.crdifrs (means), and cov.crdifr, cov.crdifs, cov.crdifrs (covariances), each indexed by item ID.

dif_item

A list of three numeric vectors identifying items flagged as DIF based on each statistic: crdifr, crdifs, and crdifrs.

score

A numeric vector of ability estimates used to compute the RDIF-CR statistics. These may be user-supplied or internally estimated.

purify

A logical value indicating whether a purification procedure was applied.

with_purify

A list of sub-objects containing the results of DIF analysis after applying the purification procedure. The sub-objects include:

purify.by: A character string indicating the RDIF-CR statistic used for purification. Possible values are "crdifr", "crdifs", or "crdifrs".

dif_stat

A data frame summarizing the final RDIF-CR statistics after purification. Same structure as in no_purify, with an additional column indicating the iteration in which the result was obtained.

moments

A list of moments (means and covariance matrices) of the RDIF-CR statistics for all items, updated based on the final iteration.

dif_item

A list of three numeric vectors identifying items flagged as DIF at any iteration, by each statistic.

n.iter

An integer indicating the number of iterations performed during the purification procedure.

score

A numeric vector of updated ability estimates used in the final iteration.

complete

A logical value indicating whether the purification process converged. If FALSE, the maximum number of iterations was reached before convergence.

alpha

A numeric value indicating the significance level (\(\alpha\)) used for hypothesis testing with RDIF-CR statistics.

Arguments

x

A data frame containing item metadata (e.g., item parameters, number of categories, IRT model types, etc.); or an object of class est_irt obtained from est_irt(), or est_item from est_item().

See est_irt() or simdat() for more details about the item metadata. This data frame can be easily created using the shape_df() function.

...

Additional arguments passed to the est_score() function.

data

A matrix of examinees' item responses corresponding to the items specified in the x argument. Rows represent examinees and columns represent items.

score

A numeric vector containing examinees' ability estimates (theta values). If not provided, crdif() will estimate ability parameters internally before computing the RDIF statistics. See est_score() for more information on scoring methods. Default is NULL.

group

A numeric or character vector indicating examinees' group membership. The length of the vector must match the number of rows in the response data matrix.

focal.name

A single numeric or character value specifying the focal group. For instance, given group = c(0, 1, 0, 1, 1) and '1' indicating the focal group, set focal.name = 1.

item.skip

A numeric vector of item indices to exclude from DIF analysis. If NULL, all items are included. Useful for omitting specific items based on prior insights.

D

A scaling constant used in IRT models to make the logistic function closely approximate the normal ogive function. A value of 1.7 is commonly used for this purpose. Default is 1.

alpha

A numeric value specifying the significance level (\(\alpha\)) for hypothesis testing using the CRDIF statistics. Default is 0.05.

missing

A value indicating missing responses in the data set. Default is NA.

purify

Logical. Indicates whether to apply a purification procedure. Default is FALSE.

purify.by

A character string specifying which RDIF statistic is used to perform the purification. Available options are "crdifrs" for \(RDIF_{RS}-CR\), "crdifr" for \(RDIF_{R}-CR\), and "crdifs" for \(RDIF_{S}-CR\).

max.iter

A positive integer specifying the maximum number of iterations allowed for the purification process. Default is 10.

min.resp

A positive integer specifying the minimum number of valid item responses required from an examinee in order to compute an ability estimate. Default is NULL.

method

A character string indicating the scoring method to use. Available options are:

"ML": Maximum likelihood estimation
"WL": Weighted likelihood estimation (Warm, 1989)
"MAP": Maximum a posteriori estimation (Hambleton et al., 1991)
"EAP": Expected a posteriori estimation (Bock & Mislevy, 1982)

Default is "ML".

range

A numeric vector of length two specifying the lower and upper bounds of the ability scale. This is used for the following scoring methods: "ML", "WL", and "MAP". Default is c(-5, 5).

norm.prior

A numeric vector of length two specifying the mean and standard deviation of the normal prior distribution. These values are used to generate the Gaussian quadrature points and weights. Ignored if method is "ML" or "WL". Default is c(0, 1).

nquad

An integer indicating the number of Gaussian quadrature points to be generated from the normal prior distribution. Used only when method is "EAP". Ignored for "ML", "WL", and "MAP". Default is 41.

weights

A two-column matrix or data frame containing the quadrature points (in the first column) and their corresponding weights (in the second column) for the latent variable prior distribution. The weights and points can be conveniently generated using the function gen.weight().

If NULL and method = "EAP", default quadrature values are generated based on the norm.prior and nquad arguments. Ignored if method is "ML", "WL", or "MAP".

ncore

An integer specifying the number of logical CPU cores to use for parallel processing. Default is 1. See est_score() for details.

verbose

Logical. If TRUE, progress messages from the purification procedure will be displayed; if FALSE, the messages will be suppressed. Default is TRUE.

Methods (by class)

crdif(default): Default method for computing the three RDIF-CR statistics using a data frame x that contains item metadata
crdif(est_irt): An object created by the function est_irt().
crdif(est_item): An object created by the function est_item().

Author

Hwanggyu Lim hglim83@gmail.com

Details

According to Penfield (2010), differential item functioning (DIF) in polytomously scored items can be conceptualized in two forms: global DIF and net DIF. Global DIF refers to differences between groups in the conditional probabilities of responding in specific score categories, thus offering a fine-grained view of DIF at the category level. In contrast, net DIF summarizes these differences into a single value representing the overall impact of DIF on the item’s expected score.

The RDIF framework using categorical residuals (RDIF-CR), implemented in crdif(), extends the original residual-based DIF framework proposed by Lim et al. (2022) to detect global DIF in polytomous items. This framework includes three statistics: \(RDIF_{R}-CR\), \(RDIF_{S}-CR\), and \(RDIF_{RS}-CR\), each designed to capture different aspects of group-level differences in categorical response patterns.

To illustrate how the RDIF-CR framework operates, consider an item with five ordered score categories (\(k \in \{0,1,2,3,4\}\)). Suppose an examinee with latent ability \(\theta\) responds with category 2. The one-hot encoded response vector for this response is \((0,0,1,0,0)^T\). Assume that the IRT model estimates the examinee’s expected score as 2.5 and predicts the category probabilities as \((0.1, 0.2, 0.4, 0.25, 0.05)^T\). In the RDIF-CR framework, the categorical residual vector is calculated by subtracting the predicted probability vector from the one-hot response vector, resulting in \((-0.1, -0.2, 0.6, -0.25, -0.05)^T\).

In contrast to the RDIF-CR framework, net DIF is assessed using a unidimensional item score residual. In this example, the residual would be \(2 - 2.5 = -0.5\). For detecting net DIF, the rdif() function should be used instead.

Note that for dichotomous items, crdif() and rdif() yield identical results. This is because the categorical probability vector for a binary item reduces to a scalar difference, making the global and net DIF evaluations mathematically equivalent.

References

Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement, 59(1), 80-104. tools:::Rd_expr_doi("doi:10.1111/jedm.12313").

Penfield, R. D. (2010). Distinguishing between net and global DIF in polytomous items. Journal of Educational Measurement, 47(2), 129–149.

Examples

Run this code

# \donttest{

############################################################################
# This example demonstrates how to detect global DIF in polytomous items
# using the RDIF-CR framework implemented in `irtQ::crdif()`.
# Simulated response data are generated from 5 GRM items with 4 score
# categories. DIF is introduced in the 1st and 5th items.
############################################################################

###############################################
# (1) Simulate response data with DIF
###############################################

set.seed(1)

# Generate ability parameters for 1000 examinees in each group
# Reference and focal groups follow N(0, 1.5^2)
theta_ref <- rnorm(1000, 0, 1.5)
theta_foc <- rnorm(1000, 0, 1.5)

# Combine abilities from both groups
theta_all <- c(theta_ref, theta_foc)

# Define item parameters using `irtQ::shape_df()`
# Items 1 and 5 are intentionally modified to exhibit DIF
par_ref <- irtQ::shape_df(
  par.prm = list(
    a = c(1, 1, 1, 2, 2),
    d = list(c(-2, 0, 1),
             c(-2, 0, 2),
             c(-2, 0, 1),
             c(-1, 0, 2),
             c(-2, 0, 0.5))
  ),
  cats = 4, model = "GRM"
)

par_foc <- irtQ::shape_df(
  par.prm = list(
    a = c(2, 1, 1, 2, 0.5),
    d = list(c(-0.5, 0, 0.5),
             c(-2, 0, 2),
             c(-2, 0, 1),
             c(-1, 0, 2),
             c(-1.5, -1, 0))
  ),
  cats = 4, model = "GRM"
)

# Generate response data
resp_ref <- irtQ::simdat(x = par_ref, theta = theta_ref, D = 1)
resp_foc <- irtQ::simdat(x = par_foc, theta = theta_foc, D = 1)

# Combine response data across groups
data <- rbind(resp_ref, resp_foc)

###############################################
# (2) Estimate item and ability parameters
###############################################

# Estimate GRM item parameters using `irtQ::est_irt()`
fit_mod <- irtQ::est_irt(data = data, D = 1, model = "GRM", cats = 4)

# Extract estimated item parameters
x <- fit_mod$par.est

# Estimate ability scores using ML method
score <- est_score(x = x, data = data, method = "ML")$est.theta

###############################################
# (3) Perform RDIF-CR DIF analysis
###############################################

# Define group membership: 1 = focal group
group <- c(rep(0, 1000), rep(1, 1000))

# (a) DIF detection without purification
dif_nopuri <- crdif(
  x = x, data = data, score = score,
  group = group, focal.name = 1, D = 1, alpha = 0.05
)
print(dif_nopuri)

# (b) DIF detection with purification using RDIF_{R}-CR
dif_puri_1 <- crdif(
  x = x, data = data, score = score,
  group = group, focal.name = 1, D = 1, alpha = 0.05,
  purify = TRUE, purify.by = "crdifr"
)
print(dif_puri_1)

# (c) DIF detection with purification using RDIF_{S}-CR
dif_puri_2 <- crdif(
  x = x, data = data, score = score,
  group = group, focal.name = 1, D = 1, alpha = 0.05,
  purify = TRUE, purify.by = "crdifs"
)
print(dif_puri_2)

# (d) DIF detection with purification using RDIF_{RS}-CR
dif_puri_3 <- crdif(
  x = x, data = data, score = score,
  group = group, focal.name = 1, D = 1, alpha = 0.05,
  purify = TRUE, purify.by = "crdifrs"
)
print(dif_puri_3)

# }

Run the code above in your browser using DataLab