plausible.value.imputation.raschtype: Plausible Value Imputation in Generalized Logistic Item Response Model

Description

This function performs unidimensional plausible value imputation (Adams & Wu, 2007; Mislevy, 1991).

Usage

plausible.value.imputation.raschtype(data=NULL, f.yi.qk=NULL, X, 
   Z=NULL, beta0=rep(0, ncol(X)), sig0=1, b=rep(1, ncol(X)), 
   a=rep(1, length(b)), c=rep(0, length(b)), d=1+0*b, 
   alpha1=0, alpha2=0, theta.list=seq(-5, 5, len=50), 
   cluster=NULL, iter, burnin, nplausible=1, printprogress=TRUE)

Arguments

data

An $N \times I$ data frame of dichotomous responses

f.yi.qk

An optional matrix which contains the individual likelihood. This matrix is produced by rasch.mml2 or rasch.copula2. The use of this argument

A matrix of individual covariates for the latent regression of $\theta$ on $X$

A matrix of individual covariates for the regression of individual residual variances on $Z$

beta0

Initial vector of regression coefficients

sig0

Initial vector of coefficients for the variance heterogeneity model

Vector of item difficulties. It must not be provided if the individual likelihood f.yi.qk is specified.

Optional vector of item slopes

Optional vector of lower item asymptotes

Optional vector of upper item asymptotes

alpha1

Parameter $\alpha_1$ in generalized item response model

alpha2

Parameter $\alpha_2$ in generalized item response model

theta.list

Vector of theta values at which the ability distribution should be evaluated

cluster

Cluster identifier (e.g. schools or classes) for including theta means in the plausible imputation.

iter

Number of iterations

burnin

Number of burn-in iterations for plausible value imputation

nplausible

Number of plausible values

printprogress

A logical indicated whether iteratiomn progress should be displayed at the console.

Value

A list with following entries:
coefs.XSampled regression coefficients for covariates $X$
coefs.ZSampled coefficients for modeling variance heterogeneity for covariates $Z$
pvdrawsMatrix with drawn plausible values
posteriorPosterior distribution from last iteration
EAPIndividual EAP estimate
SE.EAPStandard error of the EAP estimate
pv.indexesIndex of iterations for which plausible values were drawn

Details

Plausible values are drawn from the latent regression model with heterogeneous variances: $$\theta_p = X_p \beta + \epsilon_p \quad , \quad \epsilon_p \sim N( 0 , \sigma_p^2 ) \quad , \quad \log( \sigma_p ) = Z_p \gamma + \nu_p$$

References

Adams, R., & Wu. M. (2007). The mixed-coefficients multinomial logit model: A generalized form of the Rasch model. In M. von Davier & C. H. Carstensen: Multivariate and Mixture Distribution Rasch Models: Extensions and Applications (pp. 57-76). New York: Springer. Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56, 177-196.

Examples

Run this code

#############################################################################
# SIMULATED EXAMPLE 1: Rasch model with covariates
#############################################################################

set.seed(899)
I <- 21     # number of items
b <- seq(-2,2, len=I)   # item difficulties
n <- 2000       # number of students

# simulate theta and covariates
theta <- rnorm( n )
x <- .7 * theta + rnorm( n , .5 )
y <- .2 * x+ .3*theta + rnorm( n , .4 )
dfr <- data.frame( theta , 1 , x , y )

# simulate Rasch model
dat1 <- sim.raschtype( theta = theta , b = b )

# Plausible value draws
pv1 <- plausible.value.imputation.raschtype(data=dat1 , X=dfr[,-1] , b = b ,
            nplausible=3 , iter=10 , burnin=5)
# estimate linear regression based on first plausible value
mod1 <- lm( pv1$pvdraws[,1] ~ x+y )
summary(mod1)
  ##               Estimate Std. Error t value Pr(>|t|)    
  ##   (Intercept) -0.27755    0.02121  -13.09   <2e-16 ***
  ##   x            0.40483    0.01640   24.69   <2e-16 ***
  ##   y            0.20307    0.01822   11.15   <2e-16 ***

# true regression estimate
summary( lm( theta ~ x + y ) )
  ## Coefficients:
  ##             Estimate Std. Error t value Pr(>|t|)    
  ## (Intercept) -0.27821    0.01984  -14.02   <2e-16 ***
  ## x            0.40747    0.01534   26.56   <2e-16 ***
  ## y            0.18189    0.01704   10.67   <2e-16 ***

#############################################################################
# SIMULATED EXAMPLE 2: Classical test theory, homogeneous regression variance
#############################################################################

set.seed(899)
n <- 3000       # number of students
x <- round( runif( n , 0 ,1 ) )
y <- rnorm(n)
# simulate true score theta
theta <- .4*x + .5 * y + rnorm(n)
# simulate observed score by adding measurement error
sig.e <- rep( sqrt(.40) , n )
theta_obs <- theta + rnorm( n , sd=sig.e)

# define theta grid for evaluation of density
theta.list <- mean(theta_obs) + sd(theta_obs) * seq( - 5 , 5 , length=21)
# compute individual likelihood
f.yi.qk <- dnorm( outer( theta_obs , theta.list , "-" ) / sig.e )
f.yi.qk <- f.yi.qk / rowSums(f.yi.qk)
# define covariates
X <- cbind( 1 , x , y )
# draw plausible values        
mod2 <- plausible.value.imputation.raschtype( f.yi.qk =f.yi.qk , 
                  theta.list=theta.list , X=X , iter=10 , burnin=5)

# linear regression
mod1 <- lm( mod2$pvdraws[,1] ~ x+y )
summary(mod1)
  ##             Estimate Std. Error t value Pr(>|t|)    
  ## (Intercept) -0.01393    0.02655  -0.525      0.6    
  ## x            0.35686    0.03739   9.544   <2e-16 ***
  ## y            0.53759    0.01872  28.718   <2e-16 ***

# true regression model
summary( lm( theta ~ x + y ) )
  ##             Estimate Std. Error t value Pr(>|t|)    
  ## (Intercept) 0.002931   0.026171   0.112    0.911    
  ## x           0.359954   0.036864   9.764   <2e-16 ***
  ## y           0.509073   0.018456  27.584   <2e-16 ***

#############################################################################
# SIMULATED EXAMPLE 3: Classical test theory, heterogeneous regression variance
#############################################################################

set.seed(899)
n <- 5000       # number of students
x <- round( runif( n , 0 ,1 ) )
y <- rnorm(n)
# simulate true score theta
theta <- .4*x + .5 * y + rnorm(n) * ( 1 - .4 * x )
# simulate observed score by adding measurement error
sig.e <- rep( sqrt(.40) , n )
theta_obs <- theta + rnorm( n , sd=sig.e)

# define theta grid for evaluation of density
theta.list <- mean(theta_obs) + sd(theta_obs) * seq( - 5 , 5 , length=21)
# compute individual likelihood
f.yi.qk <- dnorm( outer( theta_obs , theta.list , "-" ) / sig.e )
f.yi.qk <- f.yi.qk / rowSums(f.yi.qk)
# define covariates
X <- cbind( 1 , x , y )
# draw plausible values (assuming variance homogeneity)
mod3a <- plausible.value.imputation.raschtype( f.yi.qk =f.yi.qk , 
                  theta.list=theta.list , X=X , iter=10 , burnin=5)
# draw plausible values (assuming variance heterogeneity)
#  -> include predictor Z
mod3b <- plausible.value.imputation.raschtype( f.yi.qk =f.yi.qk , 
                  theta.list=theta.list , X=X , Z=X , iter=10 , burnin=5)

# investigate variance of theta conditional on x
res3 <- sapply( 0:1 , FUN = function(vv){
        c( var(theta[x==vv]), var(mod3b$pvdraw[x==vv,1]),
              var(mod3a$pvdraw[x==vv,1]))})
rownames(res3) <- c("true" , "pv(hetero)" , "pv(homog)" )
colnames(res3) <- c("x=0","x=1")     
  ## > round( res3 , 2 )
  ##             x=0  x=1
  ## true       1.30 0.58
  ## pv(hetero) 1.29 0.55
  ## pv(homog)  1.06 0.77
## -> assuming heteroscedastic variances recovers true conditional variance

Run the code above in your browser using DataLab