pairSE: Item Parameter Calculation with Standard Errors

Description

Calculation of the item parameters for dichotomous (difficulty) or polytomous items (thurstonian thresholds) and their standard errors (SE) respectively. All parameters are calculated using a generalization (see Heine & Tarnai, 2015) of the pairwise comparison algorithm (Choppin, 1968, 1985). Missing values up to an high amount in data matrix are allowed, as long as items are proper linked together.

Usage

pairSE(
  daten,
  m = NULL,
  w = NULL,
  nsample = 30,
  size = 0.5,
  seed = "no",
  pot = TRUE,
  zerocor = TRUE,
  verbose = TRUE,
  likelihood = NULL,
  pot2 = 2,
  delta = TRUE,
  conv = 1e-04,
  maxiter = 3000,
  progress = TRUE,
  init = NULL,
  zerosum = TRUE,
  ...
)

Value

A (list) object of class c("pairSE","list") containing the item category thresholds, difficulties sigma and their standard errors.

Arguments

daten: a data.frame or matrix with optionaly named colums (names of items), potentially with missing values, comprising polytomous or dichotomous (or mixted category numbers) responses of n respondents (rows) on k items (colums) coded starting with 0 for lowest category to m-1 for highest category, with m beeing a vector (with length k) with the number of categories for the respective item.
m: an integer (will be recycled to a vector of length k) or a vector giving the number of response categories for all items - by default m = NULL, m is calculated from data, assuming that every response category is at least once present in data. For sparse data it is strongly recomended to explicitly define the number of categories by defining this argument.
w: an optional vector of case weights.
nsample: numeric specifying the number of subsamples sampled from data, which is the number of replications of the parameter calculation. WARNING! specifying high values for nsample ( > 100 ) may result in long computing time without leading to "better" estimates for SE. This may also be the case when choosing argument size="jack" (see argument size) in combination with large datasets (N > 5000).
size: numeric with valid range between 0 and 1 (but not exactly 0 or 1) specifying the size of the subsample of data when bootstraping for SE estimation. As an alternative, size can be set to the character "jack" (size="jack"). This will set the subsample size to N-1 and set nsample=N (see argument nsample), with N beeing the number of persons in daten.
seed: numeric used for set.seed(seed).
pot: either a logical or an integer >= 2 defining the power to compute of the pairwise comparison matrix. If TRUE (default) a power of three of the pairwise comparison matrix is used for further calculations. If FALSE no powers are computed.
zerocor: either a logical or an numeric value between >0 and <=1. If (in case of a logical) zerocor is set to TRUE (default) unobserved combinations (1-0, 0-1) in the data for each pair of items are given a frequency of one conf. proposal by Alexandrowicz (2011, p.373). As an alternative option a numeric value between >0 and <=1 can be assigned to unobserved combinations (1-0, 0-1) in the data for each pair of items (conf. to personal communication with A. Robitzsch; 29-03-2021).
verbose: logical, if verbose = TRUE (default) a message about subsampling is sent to console when calculating standard errors.
likelihood: see pair.
pot2: see pair.
delta: see pair.
conv: see pair.
maxiter: see pair.
progress: see pair.
init: see pair.
zerosum: see pair.
...: additional parameters passed through.

A note on standard errors

Estimation of standard errors is done by repeated calculation of item parameters for subsamples of the given data. This procedure is mainly controlled by the arguments nsample and size (see arguments). With regard to calculation time, the argument nsample may be the 'time killer'. On the other hand, things (estimation of standard errors) will not necessarily get better when choosing large values for nsample. For example choosing nsample=400 will only result in minimal change for standard error estimation in comparison to (nsample=30) which is the default setting (see examples).

Details

Parameter calculation is based on the construction of a paired comparison matrix Mnicjc with entries ficjc, representing the number of respondents who answered to item i in category c and to item j in category c-1 widening Choppin's (1968, 1985) conditional pairwise algorithm to polytomous item response formats. This algorithm is simply realized by matrix multiplication.

Estimation of standard errors is done by repeated calculation of item parameters for sub samples of the given data.

To avoid numerical problems with off diagonal zeros when constructing the pairwise comparison matrix Mnicjc, powers of the Mnicjc matrix, can be used (Choppin, 1968, 1985). Using powers k of Mnicjc, argument pot=TRUE (default), replaces the results of the direct comparisons between i and j with the sum of the indirect comparisons of i and j through an intermediate k.

In general, it is recommended to use the argument with default value pot=TRUE.

References

Choppin, B. (1968). Item Bank using Sample-free Calibration. Nature, 219(5156), 870-872.

Choppin, B. (1985). A fully conditional estimation procedure for Rasch model parameters. Evaluation in Education, 9(1), 29-42.

Heine, J. H. & Tarnai, Ch. (2015). Pairwise Rasch model item parameter recovery under sparse data conditions. Psychological Test and Assessment Modeling, 57(1), 3–36.

Alexandrowicz, R. W. (2011). 'GANZ RASCH': A Free Software for Categorical Data Analysis. Social Science Computer Review, 30(3), 369-379.

Wright, B. D., & Masters, G. N. (1982). Rating Scale Analysis. Chicago: MESA Press.

Examples

Run this code

data(bfiN) # loading example data set

# calculating item parameters and their SE for 5 neuroticism items with 6 answer categories (0-5).
neuro_itempar<-pairSE(daten = bfiN, m = 6) 
summary(neuro_itempar) # summary for result

# plotting item thresholds with with their CI = 95% 
plot(neuro_itempar)
plot(neuro_itempar,sortdif=TRUE)

###### example from details section 'Some Notes on Standard Errors' ########
neuro_itempar_400<-pairSE(daten = bfiN, m = 6,nsample=400)
plot(neuro_itempar) 
plot(neuro_itempar_400)

Run the code above in your browser using DataLab