Learn R Programming

difR (version 6.1.0)

ILHTEdif: Detect DIF via the IL-HTE mixed model

Description

Implements the Modeling item-level heterogeneous treatment effects (IL-HTE) mixed model for differential item functioning (DIF) with optional total- or rest-score purification. The model is $$\operatorname{logit}\{P(Y_{ij}=1)\} = \theta_j + b_i + \zeta_i T_j,$$ with an item location term $$\theta_j = \beta_0 + \beta_1 T_j + \varepsilon_j,$$ and subject-specific random effects \((b_i, \zeta_i)\) jointly normal. Here \(T_j\) is an indicator of group membership for the item-specific effect.

Usage

ILHTEdif(resp_mat, group, subject_ids = NULL, alpha = 0.05,
         nAGQ = 1, purify = FALSE,
         match = c("none", "total", "restscore"),
         maxIter = 2)

Value

A list with components:

model

Fitted glmer object (final iteration).

itemDIF

data.frame with item IDs and random-slope estimates \(\hat{\zeta}_j\).

itemSig

Subset of itemDIF where \(|\hat{\zeta}_j| > \mathrm{crit}\).

crit

Numeric. Decision threshold \(z_{1-\alpha/2}\times \mathrm{SD}(\zeta)\).

plot

A ggplot object showing \(\hat{\zeta}_j\) with \(\pm\) threshold.

Arguments

resp_mat

A numeric matrix or data.frame of binary responses (0/1), rows = subjects, columns = items.

group

A vector of length nrow(resp_mat) indicating group membership (factor with two levels or numeric 0/1; the second level is treated as the focal group).

subject_ids

Optional vector of subject IDs (length nrow(resp_mat)); defaults to 1:nrow(resp_mat).

alpha

Numeric in \((0,1)\). Two-sided significance level used to form the decision threshold \(\pm z_{1-\alpha/2}\,\mathrm{SD}(\zeta)\).

nAGQ

Integer. Number of adaptive Gauss--Hermite quadrature points passed to glmer. 1 is typically accurate; 0 (Laplace) is faster.

purify

Logical. If TRUE, perform iterative purification up to maxIter by recomputing the matching score after removing flagged items.

match

Character. Matching method: "none" (no matching covariate), "total" (total score over all items), or "restscore" (total excluding currently flagged items in later purification iterations).

maxIter

Integer. Maximum number of purification iterations (default 2).

Author

Sebastien Beland
Faculte des sciences de l'education
Universite de Montreal (Canada)
sebastien.beland@umontreal.ca
Josh Gilbert
Harvard Graduate School of Education
Harvard University (USA)
josh.b.gilbert@gmail.com

Details

Let \(Y_{ij}\in\{0,1\}\) be the response of subject \(i\) to item \(j\). The proposed IL-HTE model is fitted via a generalized linear mixed model (GLMM): $$\operatorname{logit}\{P(Y_{ij}=1)\} = \theta_j + b_i + \zeta_i T_j + \gamma S_i,$$ where \(b_i\) is a subject intercept, \(\zeta_i\) a subject-specific group slope (random effect), \(T_j\) encodes the focal-vs-reference group effect at the item level, and \(S_i\) is an optional matching score (total or rest-score). The item location \(\theta_j\) is modeled as $$\theta_j = \beta_0 + \beta_1 T_j + \varepsilon_j$$ with \(\varepsilon_j\) random across items. Random effects \((b_i, \zeta_i)\) are assumed jointly normal with unstructured covariance.

Iterative purification, when enabled, proceeds by (i) fitting the GLMM, (ii) flagging items with \(|\hat{\zeta}_j| > \mathrm{crit}\) where crit = qnorm(1 - alpha/2) * SD(zeta) is obtained from the random-effect standard deviation, (iii) recomputing the matching score excluding flagged items (match = "restscore") or including all items (match = "total"), and (iv) refitting until convergence or maxIter iterations.

Note: the estimation process can be long.

References

Gilbert, J. B. (2024). Modeling item-level heterogeneous treatment effects: A tutorial with the glmer function from the lme4 package in R. Behavior Research Methods, 56, 5055–5067. tools:::Rd_expr_doi("https://doi.org/10.3758/s13428-023-02245-8")

See Also

Examples

Run this code

## Not run: 
# With real data

data(verbal)
Data <- verbal[,1:24]
group <- verbal[,24]

# \donttest{

res1 <- ILHTEdif(
  resp_mat    = Data,
  group       = group,
  alpha       = 0.05
)

# With simulate data, forcing NF = NR

set.seed(2025)
NR <- 300
sim <- SimDichoDif(
  It     = 20,
  ItDIFa = c(2, 5),
  ItDIFb = c(8, 12),
  NR     = NR,
  NF     = NR,         # Same size for NF and NR
  a      = rep(1, 20),
  b      = rnorm(20, 0, 1),
  Ga     = c(0.5, -0.5),
  Gb     = c(1, -1)
)

# Extract response matrix and group vector
resp_mat    <- sim$data[, 1:20]
group       <- factor(sim$data[, 21], labels = c("Ref", "Focal"))
subject_ids <- seq_len(nrow(resp_mat))

# Run the DIF analysis
res2 <- ILHTEdif(
  resp_mat    = resp_mat,
  group       = group,
  subject_ids = subject_ids,
  alpha       = 0.05
)

# With rest score
res3 <- ILHTEdif(
     resp_mat    = resp_mat,
     group       = group,
     subject_ids = subject_ids,
     alpha       = 0.05,
     nAGQ        = 1,
     purify      = FALSE,           # activate purification
     match       = "restscore",    
     maxIter     = 3               # up to 3 purification passes
 )

# With purification

res4 <- ILHTEdif(
     resp_mat    = resp_mat,
     group       = group,
     subject_ids = subject_ids,
     alpha       = 0.05,
     nAGQ        = 1,
     purify      = TRUE,           # activate purification
     match       = "total",    
     maxIter     = 3               # up to 3 purification passes
 )


# View results for res2
print(res2$itemDIF)   # all Zeta estimates
print(res2$itemSig)   # those beyond ±1.96·SD
print(res2$plot)      # plot of Zeta ±1.96·SD
# }
## End(Not run)

Run the code above in your browser using DataLab