Performs DIF detection using Breslow-Day method.
difBD(Data, group, focal.name, anchor = NULL, match = "score", BDstat = "BD",
alpha = 0.05, purify = FALSE, nrIter = 10, p.adjust.method = NULL,
save.output = FALSE, output = c("out", "default"))
# S3 method for BD
print(x, ...)
# S3 method for BD
plot(x, pch = 8, number = TRUE, col = "red", save.plot = FALSE,
save.options = c("plot", "default", "pdf"), ...)
numeric: either the data matrix only, or the data matrix plus the vector of group membership. See Details.
numeric or character: either the vector of group membership or the column indicator (within Data
) of group membership. See Details.
numeric or character indicating the level of group
which corresponds to the focal group.
either NULL
(default) or a vector of item names (or identifiers) to specify the anchor items. See Details.
specifies the type of matching criterion. Can be either "score"
(default) to compute the test score, or any continuous or discrete variable with the same length as the number of rows of Data
. See Details.
character specifying the DIF statistic to be used. Possible values are "BD"
(default) and "trend"
. See Details.
numeric: significance level (default is 0.05).
logical: should the method be used iteratively to purify the set of anchor items? (default is FALSE
).
numeric: the maximal number of iterations in the item purification process (default is 10).
either NULL
(default) or the acronym of the method for p-value adjustment for multiple comparisons. See Details.
logical: should the output be saved into a text file? (Default is FALSE
).
character: a vector of two components. The first component is the name of the output file, the second component is either the file path or
"default"
(default value). See Details.
the result from a BD class object.
type of usual pch
and col
graphical options.
logical: should the item number identification be printed (default is TRUE
).
logical: should the plot be saved into a separate file? (default is FALSE
).
character: a vector of three components. The first component is the name of the output file, the second component is either the file path or
"default"
(default value), and the third component is the file extension, either "pdf"
(default) or "jpeg"
.
See Details.
other generic parameters for the plot
or the print
functions.
A list of class "BD" with the following arguments:
a matrix with one row per item and three columns: the first one contains the Breslow-Day statistic value, the second column indicates the degrees of freedom, and the last column displays the asymptotic p-values.
the vector of p-values for the BD statistics.
the significance level for DIF detection.
either the column indicators of the items which were detected as DIF items, or "No DIF item detected".
the value of the BDstat
argument.
a character string, either "score"
or "matching variable"
depending on the match
argument.
the value of the p.adjust.method
argument.
either NULL
or the vector of adjusted p-values for multiple comparisons.
the value of purify
option.
the number of iterations in the item purification process. Returned only if purify
is TRUE
.
a binary matrix with one row per iteration in the item purification process and one column per item. Zeros and ones in the i-th
row refer to items which were classified respectively as non-DIF and DIF items at the (i-1)-th step. The first row corresponds to the initial
classification of the items. Returned only if purify
is TRUE
.
logical indicating whether the iterative item purification process stopped before the maximal number nrIter
of allowed iterations.
Returned only if purify
is TRUE
.
the names of the items.
the value of the anchor
argument.
the value of the save.output
argument.
the value of the output
argument.
The method of Breslow-Day (1980) allows for detecting non-uniform differential item functioning without requiring an item response model approach.
The Data
is a matrix whose rows correspond to the subjects and columns to the items. In addition, Data
can hold the vector of group membership.
If so, group
indicates the column of Data
which corresponds to the group membership, either by specifying its name or by giving the column number.
Otherwise, group
must be a vector of same length as nrow(Data)
.
Missing values are allowed for item responses (not for group membership) but must be coded as NA
values. They are discarded from sum-score computation.
The vector of group membership must hold only two different values, either as numeric or character. The focal group is defined by the value of the argument focal.name
.
Two test statistics are available: the usual Breslow-Day statistic for testing homogeneous association (Aguerri, Galibert, Attorresi and Maranon, 2009)
and the trend test statistic for assessing some monotonic trend in the odds ratios (Penfield, 2003). The DIF statistic is supplied by the BDstat
argument, with values "BD"
(default) for the usual statistic and "trend"
for the trend test statistic.
The matching criterion can be either the test score or any other continuous or discrete variable to be passed in the breslowDay
function. This is specified by the match
argument. By default, it takes the value "score"
and the test score (i.e. raw score) is computed. The second option is to assign to match
a vector of continuous or discrete numeric values, which acts as the matching criterion. Note that for consistency this vector should not belong to the Data
matrix.
The threshold (or cut-score) for classifying items as DIF is computed as the quantile of the chi-squared distribution with lower-tail probability of one minus alpha
, and the degrees of freedom depend on the DIF statistic. With the usual Breslow-Day statistic (BDstat=="BD"
), it is the number of partial tables taken into account (Aguerri et al., 2009). With the trend test statistic, the degrees
of freedom are always equal to one (Penfield, 2003).
Item purification can be performed by setting purify
to TRUE
. Purification works as follows: if at least one item was detected as functioning
differently at the first step of the process, then the data set of the next step consists in all items that are currently anchor (DIF free) items, plus the
tested item (if necessary). The process stops when either two successive applications of the method yield the same classifications of the items (Clauser and Mazor,
1998), or when nrIter
iterations are run without obtaining two successive identical classifications. In the latter case a warning message is printed.
Adjustment for multiple comparisons is possible with the argument p.adjust.method
. The latter must be an acronym of one of the available adjustment methods of the p.adjust
function. According to Kim and Oshima (2013), Holm and Benjamini-Hochberg adjustments (set respectively by "Holm"
and "BH"
) perform best for DIF purposes. See p.adjust
function for further details. Note that item purification is performed on original statistics and p-values; in case of adjustment for multiple comparisons this is performed after item purification.
A pre-specified set of anchor items can be provided through the anchor
argument. It must be a vector of either item names (which must match exactly the column names of Data
argument) or integer values (specifying the column numbers for item identification). In case anchor items are provided, they are used to compute the test score (matching criterion), including also the tested item. None of the anchor items are tested for DIF: the output separates anchor items and tested items and DIF results are returned only for the latter. Note also that item purification is not activated when anchor items are provided (even if purify
is set to TRUE
). By default it is NULL
so that no anchor item is specified.
The output of the difBD
, as displayed by the print.BD
function, can be stored in a text file provided that save.output
is set to TRUE
(the default value FALSE
does not execute the storage). In this case, the name of the text file must be given as a character string into the first component
of the output
argument (default name is "out"
), and the path for saving the text file can be given through the second component of output
. The
default value is "default"
, meaning that the file will be saved in the current working directory. Any other path can be specified as a character string: see
the Examples section for an illustration.
The plot.BD
function displays the DIF statistics in a plot, with each item on the X axis. The type of point and the colour are fixed by the usual pch
and col
arguments. Option number
permits to display the item numbers instead. Also, the plot can be stored in a figure file, either in PDF or JPEG
format. Fixing save.plot
to TRUE
allows this process. The figure is defined through the components of save.options
. The first two components
perform similarly as those of the output
argument. The third component is the figure format, with allowed values "pdf"
(default) for PDF file and
"jpeg"
for JPEG file.
Aguerri, M.E., Galibert, M.S., Attorresi, H.F. and Maranon, P.P. (2009). Erroneous detection of nonuniform DIF using the Breslow-Day test in a short test. Quality and Quantity, 43, 35-44. 10.1007/s11135-007-9130-2
Breslow, N.E. and Day, N.E. (1980). Statistical methods in cancer research, vol. I: The analysis of case-control studies. Scientific Publication No 32. International Agency for Research on Cancer, Lyon.
Clauser, B.E. and Mazor, K.M. (1998). Using statistical procedures to identify differential item functioning test items. Educational Measurement: Issues and Practice, 17, 31-44.
Kim, J., and Oshima, T. C. (2013). Effect of multiple testing adjustment in differential item functioning detection. Educational and Psychological Measurement, 73, 458--470. 10.1177/0013164412467033
Magis, D., Beland, S., Tuerlinckx, F. and De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847-862. 10.3758/BRM.42.3.847
Penfield, R.D. (2003). Application of the Breslow-Day test of trend in odds ratio heterogeneity to the detection of nonuniform DIF. Alberta Journal of Educational Research, 49, 231-243.
# NOT RUN {
# Loading of the verbal data
data(verbal)
# Excluding the "Anger" variable
verbal<-verbal[colnames(verbal) != "Anger"]
# Three equivalent settings of the data matrix and the group membership
difBD(verbal, group = 25, focal.name = 1)
difBD(verbal, group = "Gender", focal.name = 1)
difBD(verbal[,1:24], group = verbal[,25], focal.name = 1)
# With the BD trend test statistic
difBD(verbal, group = 25, focal.name = 1, BDstat = "trend")
# Multiple comparisons adjustment using Benjamini-Hochberg method
difBD(verbal, group = 25, focal.name = 1, p.adjust.method = "BH")
# With item purification
difBD(verbal, group = "Gender", focal.name = 1, purify = TRUE)
difBD(verbal, group = "Gender", focal.name = 1, purify = TRUE, nrIter = 5)
# With items 1 to 5 set as anchor items
difBD(verbal, group = "Gender", focal.name = 1, anchor = 1:5)
difBD(verbal, group = "Gender", focal.name = 1, anchor = 1:5, purify = TRUE)
# Saving the output into the "BDresults.txt" file (and default path)
r <- difBD(verbal, group = 25, focal.name = 1, save.output = TRUE,
output = c("BDresults","default"))
# Graphical devices
plot(r)
# Plotting results and saving it in a PDF figure
plot(r, save.plot = TRUE, save.options = c("plot", "default", "pdf"))
# Changing the path, JPEG figure
path <- "c:/Program Files/"
plot(r, save.plot = TRUE, save.options = c("plot", path, "jpeg"))
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab