DDplot: Graphical representation of difficulty and discrimination/item validity in item analysis

Description

Plots difficulty and (generalized) discrimination or criterion validity for items of the multi-item measurement test using ggplot2. Difficulty and discrimination/validity indices are plotted for each item, items are ordered by their difficulty.

Usage

DDplot(data, item.names, discrim = "ULI", k = 3, l = 1, u = 3,
       maxscore, minscore, bin = FALSE, cutscore, average.score = FALSE,
       thr = 0.2, criterion = "none", val_type = "simple")

Arguments

data

numeric: binary or ordinal data matrix or data.frame which rows represent examinee answers (1 correct, 0 incorrect, or ordinal item scores) and columns correspond to the items.

item.names

character: the names of items. If not specified, the names of data columns are used.

discrim

character: type of discrimination index to be calculated. Possible values are "ULI" (default), "RIT", "RIR", and "none". See Details.

numeric: number of groups to which data may be divided by the total score to estimate discrimination using discrim = "ULI". Default value is 3. See Details.

numeric: lower group. Default value is 1. See Details.

numeric: upper group. Default value is 3. See Details.

maxscore

numeric: maximal scores of items. If single number is provided, the same maximal score is used for all items. If missing, vector of achieved maximal scores is calculated and used in calculations.

minscore

numeric: minimal scores of items. If single number is provided, the same maximal score is used for all items. If missing, vector of achieved maximal scores is calculated and used in calculations.

bin

logical: should the ordinal data be binarized? Deafult value is FALSE. In case that bin = TRUE, all values of data equal or greater than cutscore are marked as 1 and all values lower than cutscore are marked as 0.

cutscore

numeric: cut-score used to binarize data. If numeric, the same cutscore is used for all items. If missing, vector of maximal scores is used in calculations.

average.score

logical: should average score of the item be disaplyed instead of difficulty? Default value is FALSE. See Details.

thr

numeric: value of discrimination threshold. Default value is 0.2. With thr = NULL, no horizontal line is displayed in the plot.

criterion

numeric or logical vector: values of criterion. If supplied, disrim argument is ignored and item-criterion correlation (validity) is displayed instead. Default value is "none".

val_type

character: criterion validity measure. Possible values are "simple" (correlation between item score and validity criterion; default) and "index" (item validity index calculated as cor(item, criterion) * sqrt(((N - 1) / N) * var(item)), where N is number of respondents, see Allen & Yen, 1979, Ch. 6.4, for details). The argument is ignored if user does not supply any criterion.

Details

Discrimination is calculated using method specified in discrim. Default option "ULI" calculates difference in ratio of correct answers in upper and lower third of students. "RIT" index caluclates correlation between item score and test total score. "RIR" index caclulates correlation between item score and total score for the rest of the items. With option "none", only difficulty is displayed.

"ULI" index can be generalized using arguments k, l and u. Generalized ULI discrimination is then computed as follows: The function takes data on individuals, computes their total test score and then divides individuals into k groups. The lower and upper group are determined by l and u parameters, i.e. l-th and u-th group where the ordering is defined by increasing total score.

For ordinal data, difficulty is defined as relative score (achieved - minimal)/(maximal - minimal). Minimal score can be specified by minscore, maximal score can be specified by maxscore. Average score of items can be displayed with argument average.score = TRUE. Note that for binary data difficulty estimate is the same as average score of the item.

Note that all correlations are estimated using Pearson correlation coefficient.

References

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.

Martinkova, P., Stepanek, L., Drabinova, A., Houdek, J., Vejrazka, M., & Stuka, C. (2017). Semi-real-time analyses of item characteristics for medical school admission tests. In: Proceedings of the 2017 Federated Conference on Computer Science and Information Systems.

Examples

Run this code

# NOT RUN {
# loading 100-item medical admission test data sets
data(dataMedical, dataMedicalgraded)
# binary data set
dataBin <- dataMedical[, 1:100]
# ordinal data set
dataOrd <- dataMedicalgraded[, 1:100]

# DDplot of binary data set
DDplot(dataBin)
# }
# NOT RUN {
# DDplot of binary data set without threshold
DDplot(dataBin, thr = NULL)
# compared to DDplot using ordinal data set and 'bin = TRUE'
DDplot(dataOrd, bin = TRUE)
# compared to binarized data set using bin = TRUE and cutscore equal to 3
DDplot(dataOrd, bin = TRUE, cutscore = 3)

# DDplot of binary data using generalized ULI
# discrimination based on 5 groups, comparing 4th and 5th
# threshold lowered to 0.1
DDplot(dataBin, k = 5, l = 4, u = 5, thr = 0.1)

# DDplot of ordinal data set using ULI
DDplot(dataOrd)
# DDplot of ordinal data set using generalized ULI
# discrimination based on 5 groups, comparing 4th and 5th
# threshold lowered to 0.1
DDplot(dataOrd, k = 5, l = 4, u = 5, thr = 0.1)
# DDplot of ordinal data set using RIT
DDplot(dataOrd, discrim = "RIT")
# DDplot of ordinal data set using RIR
DDplot(dataOrd, discrim = "RIR")
# DDplot of ordinal data set disaplaying only difficulty
DDplot(dataBin, discrim = "none")

# DDplot of ordinal data set disaplaying difficulty estimates
DDplot(dataOrd)
# DDplot of ordinal data set disaplaying average item scores
DDplot(dataOrd, average.score = TRUE)

# item difficulty / criterion validity plot for data with criterion
data <- difNLR::GMAT[, 1:20]
criterion <- difNLR::GMAT[, "criterion"]
DDplot(data, criterion = criterion, val_type = "simple")
# }

Run the code above in your browser using DataLab