rm.sdt: Hierachical Rater Model Based on Signal Detection Theory (HRM-SDT)

Description

This function estimates a version of the hierarchical rater model (HRM) based on signal detection theory (HRM-SDT; DeCarlo, 2005; DeCarlo, Kim & Johnson, 2011).

Usage

rm.sdt(dat, pid, rater, Qmatrix = NULL, theta.k = seq(-9, 9, len = 30), 
    est.a.item = FALSE, est.c.rater = "n", est.d.rater = "n", est.mean=FALSE , 
    skillspace="normal" , tau.item.fixed = NULL , a.item.fixed = NULL , 
    d.min = 0.5, d.max = 100, d.start = 3, max.increment = 1, numdiff.parm = 0.00001, 
    maxdevchange = 0.1, globconv = .001, maxiter = 1000, msteps = 4, mstepconv = 0.001)

## S3 method for class 'rm.sdt':
summary(object,...)    

## S3 method for class 'rm.sdt':
plot(x, ask=TRUE, ...)

## S3 method for class 'rm.sdt':
anova(object,...)

## S3 method for class 'rm.sdt':
logLik(object,...)

## S3 method for class 'rm.sdt':
IRT.factor.scores(object, type="EAP", ...)

## S3 method for class 'rm.sdt':
IRT.irfprob(object,...)

## S3 method for class 'rm.sdt':
IRT.likelihood(object,...)

## S3 method for class 'rm.sdt':
IRT.posterior(object,...)

## S3 method for class 'rm.sdt':
IRT.modelfit(object,...)

## S3 method for class 'IRT.modelfit.rm.sdt':
summary(object,...)

Arguments

dat

Original data frame. Ratings on variables must be in rows, i.e. every row corresponds to a person-rater combination.

pid

Person identifier.

rater

Rater identifier.

Qmatrix

An optional Q-matrix. If this matrix is not provided, then by default the ordinary scoring of categories (from 0 to the maximum score of $K$) is used.

theta.k

A grid of theta values for the ability distribution.

est.a.item

Should item parameters $a_i$ be estimated?

est.c.rater

Type of estimation for item-rater parameters $c_{ir}$ in the signal detection model. Options are 'n' (no estimation), 'e' (set all parameters equal to each other), 'i' (item wise estmation), 'r' (rat

est.d.rater

Type of estimation of $d$ parameters. Options are the same as in est.c.rater.

est.mean

Optional logical indicating whether the mean of the trait distribution should be estimated.

skillspace

Specified $\theta$ distribution type. It can be "normal" or "discrete". In the latter case, all probabilities of the distribution are separately estimated.

tau.item.fixed

Optional matrix with three columns specifying fixed $\tau$ parameters. The first two columns denote item and category indices, the third the fixed value. See Example 3.

a.item.fixed

Optional matrix with two columns specifying fixed $a$ parameters. First column: Item index. Second column: Fixed $a$ parameter.

d.min

Minimal $d$ parameter to be estimated

d.max

Maximal $d$ parameter to be estimated

d.start

Starting value of $d$ parameters

max.increment

Maximum increment of item parameters during estimation

numdiff.parm

Numerical differentiation step width

maxdevchange

Maximum relative deviance change as a convergence criterion

globconv

Maximum parameter change

maxiter

Maximum number of iterations

msteps

Maximum number of iterations during an M step

mstepconv

Convergence criterion in an M step

object

Object of class rm.sdt

ask

Optional logical indicating whether a new plot should be asked for.

type

Factor score estimation method. Up to now, only type="EAP" is supported.

...

Further arguments to be passed

Value

A list with following entries:
devianceDeviance
icInformation criteria and number of parameters
itemData frame with item parameters. The columns N and M denote the number of oberved ratings and the observed mean of all ratings, respectively. In addition to item parameters $\tau_{ik}$ and $a_i$, the mean for the latent response (latM) is computed as $E( \eta_i ) = \sum_p P( \theta_p ) q_{ik} P( \eta_i = k | \theta_p )$ which provides an item parameter at the original metric of ratings. The latent standard deviation (latSD) is computed in the same manner.
raterData frame with rater parameters. Transformed $c$ parameters (c_x.trans) are computed as $c_{irk} / ( d_{ir} )$.
personData frame with person parameters: EAP and corresponding standard errors
EAP.relEAP reliability
EAP.relEAP reliability
muMean of the trait distribution
sigmaStandard deviation of the trait distribution
tau.itemItem parameters $\tau_{ik}$
se.tau.itemStandard error of item parameters $\tau_{ik}$
a.itemItem slopes $a_i$
se.a.itemStandard error of item slopes $a_i$
c.raterRater parameters $c_{irk}$
se.c.raterStandard error of rater severity parameter $c_{irk}$
d.raterRater slope parameter $d_{ir}$
se.d.raterStandard error of rater slope parameter $d_{ir}$
f.yi.qkIndividual likelihood
f.qk.yiIndividual posterior distribution
probsItem probabilities at grid theta.k. Note that these probabilities are calculated on the pseudo items $i \times r$, i.e. the interaction of item and rater.
prob.itemProbabilities $P( \eta_i = \eta | \theta )$ of latent item responses evaluated at theta grid $\theta_p$.
n.ikExpected counts
pi.kEstimated trait distribution $P(\theta_p)$.
maxKMaximum number of categories
procdataProcessed data
iterNumber of iterations
...Further values

Details

The specification of the model follows DeCarlo et al. (2011). The second level models the ideal rating (latent response) $\eta =0, ...,K$ of person $p$ on item $i$ $$P( \eta_{pi} = \eta | \theta_p ) \propto exp( a_{i} q_{ik} \theta_p - \tau_{ik} )$$ At the first level, the ratings $X_{pir}$ for person $p$ on item $i$ and rater $r$ are modelled as a signal detection model $$P( X_{pir} \le k | \eta_{pi} ) = G( c_{irk} - d_{ir} \eta_{pi} )$$ where $G$ is the logistic distribution function and the categories are $k=1,\ldots , K+1$. Note that the item response model can be equivalently written as $$P( X_{pir} \ge k | \eta_{pi} ) = G( d_{ir} \eta_{pi} - c_{irk})$$ The thresholds $c_{irk}$ can be further restricted to $c_{irk} = c_{k}$ (est.c.rater='e'), $c_{irk} = c_{ik}$ (est.c.rater='i') or $c_{irk} = c_{ir}$ (est.c.rater='r'). The same holds for rater precision parameters $d_{ir}$.

References

DeCarlo, L. T. (2005). A model of rater behavior in essay grading based on signal detection theory. Journal of Educational Measurement, 42, 53-76. DeCarlo, L. T. (2010). Studies of a latent-class signal-detection model for constructed response scoring II: Incomplete and hierarchical designs. ETS Research Report ETS RR-10-08. Princeton NJ: ETS. DeCarlo, T., Kim, Y., & Johnson, M. S. (2011). A hierarchical rater model for constructed responses, with a signal detection rater model. Journal of Educational Measurement, 48, 333-356.

Examples

Run this code

#############################################################################
# EXAMPLE 1: Hierarchical rater model (HRM-SDT) data.ratings1
#############################################################################
data(data.ratings1)
dat <- data.ratings1

# Model 1: Partial Credit Model: no rater effects
mod1 <- rm.sdt( dat[ , paste0( "k",1:5) ] , rater=dat$rater , 
            pid=dat$idstud , est.c.rater="n" , est.d.rater="n" , maxiter=15)
summary(mod1)
            
# Model 2: Generalized Partial Credit Model: no rater effects
mod2 <- rm.sdt( dat[ , paste0( "k",1:5) ] , rater=dat$rater , 
            pid=dat$idstud , est.c.rater="n" , est.d.rater="n" , 
            est.a.item =TRUE , d.start=100 , maxiter=15)
summary(mod2)
            
# Model 3: Equal effects in SDT
mod3 <- rm.sdt( dat[ , paste0( "k",1:5) ] , rater=dat$rater , 
            pid=dat$idstud , est.c.rater="e" , est.d.rater="e" , maxiter=15)
summary(mod3)

# Model 4: Rater effects in SDT
mod4 <- rm.sdt( dat[ , paste0( "k",1:5) ] , rater=dat$rater , 
            pid=dat$idstud , est.c.rater="r" , est.d.rater="r" , maxiter=15)
summary(mod4)

#############################################################################
# EXAMPLE 2: HRM-SDT data.ratings3
#############################################################################

data(data.ratings3)
dat <- data.ratings3
dat <- dat[ dat$rater < 814 , ]
psych::describe(dat)
            
# Model 1: item- and rater-specific effects
mod1 <- rm.sdt( dat[ , paste0( "crit",c(2:4)) ] , rater=dat$rater , 
            pid=dat$idstud , est.c.rater="a" , est.d.rater="a" , maxiter=10)
summary(mod1)
plot(mod1)

# Model 2: Differing number of categories per variable
mod2 <- rm.sdt( dat[ , paste0( "crit",c(2:4,6)) ] , rater=dat$rater , 
            pid=dat$idstud , est.c.rater="a" , est.d.rater="a" , maxiter=10)
summary(mod2)
plot(mod2)

#############################################################################
# EXAMPLE 3: Hierarchical rater model with discrete skill spaces
#############################################################################

data(data.ratings3)
dat <- data.ratings3
dat <- dat[ dat$rater < 814 , ]
psych::describe(dat)

# Model 1: Discrete theta skill space with values of 0,1,2 and 3
mod1 <- rm.sdt( dat[ , paste0( "crit",c(2:4)) ] , theta.k = 0:3 , rater=dat$rater , 
            pid=dat$idstud , est.c.rater="a" , est.d.rater="a" , skillspace="discrete" ,
            maxiter=20)
summary(mod1)
plot(mod1)

# Model 2: Modelling of one item by using a discrete skill space and
#          fixed item parameters

# fixed tau and a parameters
tau.item.fixed <- cbind( 1, 1:3,  100*cumsum( c( 0.5, 1.5, 2.5)) )
a.item.fixed <- cbind( 1, 100 )
# fit HRM-SDT 
mod2 <- rm.sdt( dat[ , "crit2" , drop=FALSE] , theta.k = 0:3 , rater=dat$rater , 
            tau.item.fixed=tau.item.fixed ,a.item.fixed=a.item.fixed, pid=dat$idstud, 
            est.c.rater="a", est.d.rater="a", skillspace="discrete", maxiter=20)
summary(mod2)            
plot(mod2)

Run the code above in your browser using DataLab