mice.impute.leftcenslognorm: Imputation for limits of detection problems

Description

This function integrates with mice to impute values below the LOD using a left censored log-normal distribution.

Usage

mice.impute.leftcenslognorm(
  y,
  ry,
  x,
  wy = NULL,
  lod = NULL,
  debug = FALSE,
  ...
)

Arguments

Vector to be imputed

Logical vector of length length(y) indicating the the subset of elements in y to which the imputation model is fitted. The ry generally distinguishes the observed (TRUE) and missing values (FALSE) in y.

Numeric design matrix with length(y) rows with predictors for y. Matrix x may have no missing values.

Logical vector of length length(y). A TRUE value indicates locations in y for which imputations are created.

lod

numeric vector of limits of detection (must correspond to index in original data) OR list in which each element corresponds to observation level limits of detection for each variable (list index must correspond to index in original data)

debug

logical, print extra info

...

arguments to survreg

Value

Vector with imputed data, same type as y, and of length sum(wy)

Details

While this function has utility far beyond qgcomp, it is included in the qgcomp package because it will be useful for a variety of settings in which qgcomp is useful. Note that LOD problems where the LOD is small, and the q parameter from qgcomp.noboot or qgcomp.boot is not large, the LOD may be below the lowest quantile cutpoint which will yield identical datasets from the MICE procedure in terms of quantized exposure data. If only exposures are missing, and they have low LODs, then there will be no benefit in qgcomp from using MICE rather than imputing some small value below the LOD.

Examples

Run this code

# NOT RUN {
N = 100
set.seed(123)
dat <- data.frame(y=runif(N), x1=runif(N), x2=runif(N), z=runif(N))
true = qgcomp.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'),
        data=dat, q=2, family=gaussian())
mdat <- dat
mdat$x1 = ifelse(mdat$x1>0.5, mdat$x1, NA)
mdat$x2 = ifelse(mdat$x2>0.75, mdat$x2, NA)
cc <- qgcomp.noboot(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'),
       data=mdat[complete.cases(mdat),], q=2, family=gaussian())

# }
# NOT RUN {
# note the following example imputes from the wrong parametric model and is expected to be biased
# as a result (but it demonstrates how to use qgcomp and mice together)
library("mice")
library("survival")
set.seed(1231)
impdat = mice(data = mdat,
  method = c("", "leftcenslognorm", "leftcenslognorm", ""),
  lod=c(NA, 0.5, 0.75, NA), debug=FALSE, m=10)
qc.fit.imp <- list(
  call = call("qgcomp.noboot(y~., expnms = c('x1', 'x2'), family=gaussian())"),
  call1 = impdat$call,
  nmis = impdat$nmis,
  analyses = lapply(1:10, function(x) qgcomp.noboot(y~., expnms = c("x1", "x2"),
    data=complete(impdat, x), family=gaussian(), bayes=TRUE))
 )
#alternative way to specify limits of detection (useful if not all observations have same limit)
lodlist = list(rep(NA, N), rep(0.5, N), rep(0.75, N), rep(NA, N))
#lodlist = data.frame(rep(NA, N), rep(0.5, N), rep(0.75, N), rep(NA, N)) # also works
set.seed(1231)
impdat_alt = mice(data = mdat,
  method = c("", "leftcenslognorm", "leftcenslognorm", ""),
  lod=lodlist, debug=FALSE, m=10)
qc.fit.imp_alt <- list(
  call = call("qgcomp.noboot(y~., expnms = c('x1', 'x2'), family=gaussian())"),
  call1 = impdat_alt$call,
  nmis = impdat_alt$nmis,
  analyses = lapply(1:10, function(x) qgcomp.noboot(y~., expnms = c("x1", "x2"),
    data=complete(impdat_alt, x), family=gaussian(), bayes=TRUE))
)
obj = pool(as.mira(qc.fit.imp))
obj_alt = pool(as.mira(qc.fit.imp_alt))
# true values
true
# complete case analysis
cc
# MI based analysis (identical answers for different ways to specify limits of detection)
summary(obj)
summary(obj_alt)

# summarizing weights (note that the weights should *not* be pooled 
#    because they mean different things depending on their direction)
expnms = c("x1", "x2")
wts = as.data.frame(t(sapply(qc.fit.imp$analyses, 
      function(x) c(-x$neg.weights, x$pos.weights)[expnms])))
eachwt = do.call(c, wts)
expwts = data.frame(Exposure = rep(expnms, each=nrow(wts)), Weight=eachwt)
library(ggplot2)
ggplot(data=expwts)+ theme_classic() +
  geom_point(aes(x=Exposure, y=Weight)) +
  geom_hline(aes(yintercept=0))


# now with survival data (very similar)
impdat = mice(data = mdat,
  method = c("", "leftcenslognorm", "leftcenslognorm", ""),
  lod=c(NA, 0.5, 0.75, NA), debug=FALSE)
qc.fit.imp <- list(
  call = call("qgcomp.cox.noboot(Surv(y)~., expnms = c('x1', 'x2'))"),
  call1 = impdat$call,
  nmis = impdat$nmis,
  analyses = lapply(1:5, function(x) qgcomp.cox.noboot(Surv(y)~., expnms = c("x1", "x2"),
    data=complete(impdat, x)))
)
obj = pool(as.mira(qc.fit.imp))
# MI based analysis
summary(obj)

# }

Run the code above in your browser using DataLab