rm.sdt: Hierarchical Rater Model Based on Signal Detection Theory (HRM-SDT)

Description

This function estimates a version of the hierarchical rater model (HRM) based on signal detection theory (HRM-SDT; DeCarlo, 2005; DeCarlo, Kim & Johnson, 2011; Robitzsch & Steinfeld, 2018). The model is estimated by means of an EM algorithm adapted from multilevel latent class analysis (Vermunt, 2008).

Usage

rm.sdt(dat, pid, rater, Qmatrix=NULL, theta.k=seq(-9, 9, len=30),
    est.a.item=FALSE, est.c.rater="n", est.d.rater="n", est.mean=FALSE, est.sigma=TRUE,
    skillspace="normal", tau.item.fixed=NULL, a.item.fixed=NULL,
    d.min=0.5, d.max=100, d.start=3, c.start=NULL, tau.start=NULL, sd.start=1,
    d.prior=c(3,100), c.prior=c(3,100), tau.prior=c(0,1000), a.prior=c(1,100),
    link_item="GPCM", max.increment=1, numdiff.parm=0.00001, maxdevchange=0.1,
    globconv=.001, maxiter=1000, msteps=4, mstepconv=0.001, optimizer="nlminb" )
# S3 method for rm.sdt
summary(object, file=NULL, ...)
# S3 method for rm.sdt
plot(x, ask=TRUE, ...)
# S3 method for rm.sdt
anova(object,...)
# S3 method for rm.sdt
logLik(object,...)
# S3 method for rm.sdt
IRT.factor.scores(object, type="EAP", ...)
# S3 method for rm.sdt
IRT.irfprob(object,...)
# S3 method for rm.sdt
IRT.likelihood(object,...)
# S3 method for rm.sdt
IRT.posterior(object,...)
# S3 method for rm.sdt
IRT.modelfit(object,...)
# S3 method for IRT.modelfit.rm.sdt
summary(object,...)

Arguments

dat

Original data frame. Ratings on variables must be in rows, i.e. every row corresponds to a person-rater combination.

pid

Person identifier.

rater

Rater identifier.

Qmatrix

An optional Q-matrix. If this matrix is not provided, then by default the ordinary scoring of categories (from 0 to the maximum score of $K$) is used.

theta.k

A grid of theta values for the ability distribution.

est.a.item

Should item parameters $a_i$ be estimated?

est.c.rater

Type of estimation for item-rater parameters $c_{ir}$ in the signal detection model. Options are 'n' (no estimation), 'e' (set all parameters equal to each other), 'i' (itemwise estimation), 'r' (rater wise estimation) and 'a' (all parameters are estimated independently from each other).

est.d.rater

Type of estimation of $d$ parameters. Options are the same as in est.c.rater.

est.mean

Optional logical indicating whether the mean of the trait distribution should be estimated.

est.sigma

Optional logical indicating whether the standard deviation of the trait distribution should be estimated.

skillspace

Specified $\theta$ distribution type. It can be "normal" or "discrete". In the latter case, all probabilities of the distribution are separately estimated.

tau.item.fixed

Optional matrix with three columns specifying fixed $\tau$ parameters. The first two columns denote item and category indices, the third the fixed value. See Example 3.

a.item.fixed

Optional matrix with two columns specifying fixed $a$ parameters. First column: Item index. Second column: Fixed $a$ parameter.

d.min

Minimal $d$ parameter to be estimated

d.max

Maximal $d$ parameter to be estimated

d.start

Starting value(s) of $d$ parameters

c.start

Starting values of $c$ parameters

tau.start

Starting values of $\tau$ parameters

sd.start

Starting value for trait standard deviation

d.prior

Normal prior $N(M,S^2)$ for $d$ parameters

c.prior

Normal prior for $c$ parameters. The prior for parameter $c_{irk}$ is defined as $M \cdot ( k - 0.5) $ where $M$ is c.prior[1].

tau.prior

Normal prior for $\tau$ parameters

a.prior

Normal prior for $a$ parameters

link_item

Type of item response function for latent responses. Can be "GPCM" for the generalized partial credit model or "GRM" for the graded response model.

max.increment

Maximum increment of item parameters during estimation

numdiff.parm

Numerical differentiation step width

maxdevchange

Maximum relative deviance change as a convergence criterion

globconv

Maximum parameter change

maxiter

Maximum number of iterations

msteps

Maximum number of iterations during an M step

mstepconv

Convergence criterion in an M step

optimizer

Choice of optimization function in M-step for item parameters. Options are "nlminb" for stats::nlminb and "optim" for stats::optim.

object

Object of class rm.sdt

file

Optional file name in which summary should be written.

Object of class rm.sdt

ask

Optional logical indicating whether a new plot should be asked for.

type

Factor score estimation method. Up to now, only type="EAP" is supported.

…

Further arguments to be passed

Value

A list with following entries:

deviance

Deviance

Information criteria and number of parameters

item

Data frame with item parameters. The columns N and M denote the number of observed ratings and the observed mean of all ratings, respectively. In addition to item parameters $\tau_{ik}$ and $a_i$, the mean for the latent response (latM) is computed as $E( \eta_i )=\sum_p P( \theta_p ) q_{ik} P( \eta_i=k | \theta_p ) $ which provides an item parameter at the original metric of ratings. The latent standard deviation (latSD) is computed in the same manner.

rater

Data frame with rater parameters. Transformed $c$ parameters (c_x.trans) are computed as $c_{irk} / ( d_{ir} )$.

person

Data frame with person parameters: EAP and corresponding standard errors

EAP.rel

EAP reliability

EAP.rel

EAP reliability

Mean of the trait distribution

sigma

Standard deviation of the trait distribution

tau.item

Item parameters $\tau_{ik}$

se.tau.item

Standard error of item parameters $\tau_{ik}$

a.item

Item slopes $a_i$

se.a.item

Standard error of item slopes $a_i$

c.rater

Rater parameters $c_{irk}$

se.c.rater

Standard error of rater severity parameter $c_{irk}$

d.rater

Rater slope parameter $d_{ir}$

se.d.rater

Standard error of rater slope parameter $d_{ir}$

f.yi.qk

Individual likelihood

f.qk.yi

Individual posterior distribution

probs

Item probabilities at grid theta.k. Note that these probabilities are calculated on the pseudo items $i \times r$, i.e. the interaction of item and rater.

prob.item

Probabilities $P( \eta_i=\eta | \theta )$ of latent item responses evaluated at theta grid $\theta_p$.

n.ik

Expected counts

pi.k

Estimated trait distribution $P(\theta_p)$.

maxK

Maximum number of categories

procdata

Processed data

iter

Number of iterations

…

Further values

Details

The specification of the model follows DeCarlo et al. (2011). The second level models the ideal rating (latent response) $\eta=0, ...,K$ of person $p$ on item $i$. The option link_item='GPCM' follows the generalized partial credit model $$ P( \eta_{pi}=\eta | \theta_p ) \propto exp( a_{i} q_{i \eta } \theta_p - \tau_{i \eta } ) $$. The option link_item='GRM' employs the graded response model $$ P( \eta_{pi}=\eta | \theta_p )= \Psi( \tau_{i,\eta + 1} - a_i \theta_p ) - \Psi( \tau_{i,\eta} - a_i \theta_p ) $$

At the first level, the ratings $X_{pir}$ for person $p$ on item $i$ and rater $r$ are modeled as a signal detection model $$ P( X_{pir} \le k | \eta_{pi} )= G( c_{irk} - d_{ir} \eta_{pi} )$$ where $G$ is the logistic distribution function and the categories are $k=1,\ldots, K+1$. Note that the item response model can be equivalently written as $$ P( X_{pir} \ge k | \eta_{pi} )= G( d_{ir} \eta_{pi} - c_{irk})$$

The thresholds $c_{irk}$ can be further restricted to $c_{irk}=c_{k}$ (est.c.rater='e'), $c_{irk}=c_{ik}$ (est.c.rater='i') or $c_{irk}=c_{ir}$ (est.c.rater='r'). The same holds for rater precision parameters $d_{ir}$.

References

DeCarlo, L. T. (2005). A model of rater behavior in essay grading based on signal detection theory. Journal of Educational Measurement, 42, 53-76.

DeCarlo, L. T. (2010). Studies of a latent-class signal-detection model for constructed response scoring II: Incomplete and hierarchical designs. ETS Research Report ETS RR-10-08. Princeton NJ: ETS.

DeCarlo, T., Kim, Y., & Johnson, M. S. (2011). A hierarchical rater model for constructed responses, with a signal detection rater model. Journal of Educational Measurement, 48, 333-356.

Robitzsch, A., & Steinfeld, J. (2018). Item response models for human ratings: Overview, estimation methods, and implementation in R. Psychological Test and Assessment Modeling, 60(1), 101-139.

Vermunt, J. K. (2008). Latent class and finite mixture models for multilevel data sets. Statistical Methods in Medical Research, 17, 33-51.

Examples

Run this code

# NOT RUN {
#############################################################################
# EXAMPLE 1: Hierarchical rater model (HRM-SDT) data.ratings1
#############################################################################
data(data.ratings1)
dat <- data.ratings1

# }
# NOT RUN {
# Model 1: Partial Credit Model: no rater effects
mod1 <- sirt::rm.sdt( dat[, paste0( "k",1:5) ], rater=dat$rater,
            pid=dat$idstud, est.c.rater="n", d.start=100,  est.d.rater="n" )
summary(mod1)

# Model 2: Generalized Partial Credit Model: no rater effects
mod2 <- sirt::rm.sdt( dat[, paste0( "k",1:5) ], rater=dat$rater,
            pid=dat$idstud, est.c.rater="n", est.d.rater="n",
            est.a.item=TRUE, d.start=100)
summary(mod2)

# Model 3: Equal effects in SDT
mod3 <- sirt::rm.sdt( dat[, paste0( "k",1:5) ], rater=dat$rater,
            pid=dat$idstud, est.c.rater="e", est.d.rater="e")
summary(mod3)

# Model 4: Rater effects in SDT
mod4 <- sirt::rm.sdt( dat[, paste0( "k",1:5) ], rater=dat$rater,
            pid=dat$idstud, est.c.rater="r", est.d.rater="r")
summary(mod4)

#############################################################################
# EXAMPLE 2: HRM-SDT data.ratings3
#############################################################################

data(data.ratings3)
dat <- data.ratings3
dat <- dat[ dat$rater < 814, ]
psych::describe(dat)

# Model 1: item- and rater-specific effects
mod1 <- sirt::rm.sdt( dat[, paste0( "crit",c(2:4)) ], rater=dat$rater,
            pid=dat$idstud, est.c.rater="a", est.d.rater="a" )
summary(mod1)
plot(mod1)

# Model 2: Differing number of categories per variable
mod2 <- sirt::rm.sdt( dat[, paste0( "crit",c(2:4,6)) ], rater=dat$rater,
            pid=dat$idstud, est.c.rater="a", est.d.rater="a")
summary(mod2)
plot(mod2)

#############################################################################
# EXAMPLE 3: Hierarchical rater model with discrete skill spaces
#############################################################################

data(data.ratings3)
dat <- data.ratings3
dat <- dat[ dat$rater < 814, ]
psych::describe(dat)

# Model 1: Discrete theta skill space with values of 0,1,2 and 3
mod1 <- sirt::rm.sdt( dat[, paste0( "crit",c(2:4)) ], theta.k=0:3, rater=dat$rater,
            pid=dat$idstud, est.c.rater="a", est.d.rater="a", skillspace="discrete" )
summary(mod1)
plot(mod1)

# Model 2: Modelling of one item by using a discrete skill space and
#          fixed item parameters

# fixed tau and a parameters
tau.item.fixed <- cbind( 1, 1:3,  100*cumsum( c( 0.5, 1.5, 2.5)) )
a.item.fixed <- cbind( 1, 100 )
# fit HRM-SDT
mod2 <- sirt::rm.sdt( dat[, "crit2", drop=FALSE], theta.k=0:3, rater=dat$rater,
            tau.item.fixed=tau.item.fixed,a.item.fixed=a.item.fixed, pid=dat$idstud,
            est.c.rater="a", est.d.rater="a", skillspace="discrete" )
summary(mod2)
plot(mod2)
# }

Run the code above in your browser using DataLab