The algorithm works by fitting multiple GRMTrees, each on a random sample of
the original data (either through bootstrap sampling or subsampling). For each
tree, approximately one-third of the observations are left out as out-of-bag
(OOB) samples, which are used for internal validation and variable importance
calculation. The ensemble approach reduces variance, minimizes overfitting,
and provides more reliable identification of covariates associated with DIF.
Key advantages of the GRM Forest approach include:
Enhanced stability in DIF detection across different sampling variations
Robust variable importance measures that quantify the relative contribution
of each covariate to DIF patterns
Reduced false positive rates through consensus-based detection
Ability to handle high-dimensional covariate spaces effectively
Internal validation through out-of-bag error estimation
The forest implementation supports both bootstrap aggregation (where samples
are drawn with replacement) and subsampling (without replacement), allowing
flexibility for different data characteristics and research objectives.