qgcomp.partials: Partial effect sizes, confidence intervals, hypothesis tests

Description

Obtain effect estimates for "partial positive" and "partial negative" effects using quantile g-computation. This approach uses sample splitting to evaluate the overall impact of a set of variables with effects in a single direction, where, using training data, all variables with effects in the same direction are grouped.

Usage

qgcomp.partials(
  fun = c("qgcomp.glm.noboot", "qgcomp.cox.noboot", "qgcomp.zi.noboot"),
  traindata = NULL,
  validdata = NULL,
  expnms = NULL,
  .fixbreaks = TRUE,
  .globalbreaks = FALSE,
  ...
)

Value

A 'qgcompmultifit' object, which inherits from list, which contains

posmix: character vector of variable names with positive coefficients in the qgcomp model fit to the training data
negmix: character vector of variable names with negative coefficients in the qgcomp model fit to the training data
pos.fit: a qgcompfit object fit to the validation data, in which the exposures of interest are contained in 'posmix'
neg.fit: a qgcompfit object fit to the validation data, in which the exposures of interest are contained in 'negmix'

Arguments

fun: character variable in the set "qgcomp.glm.noboot" (binary, count, continuous outcomes), "qgcomp.cox.noboot" (survival outcomes), "qgcomp.zi.noboot" (zero inflated outcomes). This describes which qgcomp package function is used to fit the model. (default = "qgcomp.glm.noboot")
traindata: Data frame with training data
validdata: Data frame with validation data
expnms: Exposure mixture of interest
.fixbreaks: (logical, overridden by .globalbreaks) Use the same quantile cutpoints in the training and validation data (selected in the training data). As of version 2.8.11, the default is TRUE, whereas it was implicitly FALSE in prior verions. Setting to TRUE increases variance but greatly decreases bias in smaller samples.
.globalbreaks: (logical, if TRUE, overrides .fixbreaks) Use the same quantile cutpoints in the training and validation data (selected in combined training and validation data). As of version 2.8.11, the default is TRUE, whereas it was implicitly FALSE in prior verions. Setting to TRUE increases variance but greatly decreases bias in smaller samples.
...: Arguments to qgcomp.glm.noboot, qgcomp.cox.noboot, or qgcomp.zi.noboot

Details

In the basic (non bootstrapped) qgcomp functions, the positive and negative "sums of coefficients" or "partial effect sizes" are given, which equal the sum of the negative and positive coefficients in the underlying model. Unfortunately, these partial effects don't constitute variables for which we can derive confidence intervals or hypothesis tests, so they are mainly for exploratory purposes. By employing sample splitting, however, we can obtain better estimates of these partial effects.

Sample splitting proceeds by partitioning the data into two samples (40/60 training/validtion split seems acceptable in many circumstances). The "overall mixture effect" is then estimated in the training data, and the mixture variables with positive and negative coefficients are split into separate groups. These two different groups are then used as "the mixture of interest" in two additional qgcomp fits, where the mixture of interest is adjusted for the other exposure variables. For example, if the "positive partial effect" is of interest, then this effect is equal to the sum of the coefficients in the qgcomp model fit to the validation data, with the mixture of interest selected by the original fit to the training data (note that some of these coefficients may be negative in the fit to the validation data - this is expected and necessary for valid hypothesis tests).

The positive/negative partial effects are necessarily exploratory, but sample splitting preserves the statistical properties at the expense of wider confidence intervals and larger variances. The two resulting mixture groups groups should be inspected for

Examples

Run this code

set.seed(123223)
dat = qgcomp::simdata_quantized(n=1000, outcomtype="continuous", cor=c(.75, 0), 
                                b0=0, coef=c(0.25,-0.25,0,0), q=4)
cor(dat)
# overall fit (more or less null due to counteracting exposures)
(overall <- qgcomp.glm.noboot(f=y~., q=NULL, expnms=c("x1", "x2", "x3", "x4"), data=dat))

# partial effects using 40% training/60% validation split
trainidx <- sample(1:nrow(dat), round(nrow(dat)*0.4))
valididx <- setdiff(1:nrow(dat),trainidx)
traindata = dat[trainidx,]
validdata = dat[valididx,]
splitres <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., q=NULL, 
    traindata=traindata,validdata=validdata, expnms=c("x1", "x2", "x3", "x4"))
splitres
if (FALSE) {
# under the null, both should give null results
set.seed(123223)
dat = simdata_quantized(n=1000, outcomtype="continuous", cor=c(.75, 0), 
                                b0=0, coef=c(0,0,0,0), q=4)
# 40% training/60% validation
trainidx2 <- sample(1:nrow(dat), round(nrow(dat)*0.4))
valididx2 <- setdiff(1:nrow(dat),trainidx2)
traindata2 <- dat[trainidx2,]
validdata2 <- dat[valididx2,]
splitres2 <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., 
   q=NULL, traindata=traindata2,validdata=validdata2, expnms=c("x1", "x2", "x3", "x4"))
splitres2

# 60% training/40% validation
trainidx3 <- sample(1:nrow(dat), round(nrow(dat)*0.6))
valididx3 <- setdiff(1:nrow(dat),trainidx3)
traindata3 <- dat[trainidx3,]
validdata3 <- dat[valididx3,]
splitres3 <- qgcomp.partials(fun="qgcomp.glm.noboot", f=y~., q=NULL, 
    traindata=traindata3,validdata=validdata3, expnms=c("x1", "x2", "x3", "x4"))
splitres3

# survival outcome
set.seed(50)
N=1000
dat = simdata_quantized(n=1000, outcomtype="survival", cor=c(.75, 0, 0, 0, 1), 
                                b0=0, coef=c(1,0,0,0,0,1), q=4)
names(dat)[which(names(dat=="x5"))] = "z"
trainidx4 <- sample(1:nrow(dat), round(nrow(dat)*0.6))
valididx4 <- setdiff(1:nrow(dat),trainidx4)
traindata4 <- dat[trainidx4,]
validdata4 <- dat[valididx4,]
expnms=paste0("x", 1:5)
f = survival::Surv(time, d)~x1 + x2 + x3 + x4 + x5 + z
(fit1 <- survival::coxph(f, data = dat))
(overall <- qgcomp.cox.noboot(f, expnms = expnms, data = dat))
(splitres4 <- qgcomp.partials(fun="qgcomp.cox.noboot", f=f, q=4,
               traindata=traindata4,validdata=validdata4,
                expnms=expnms))
 
 # zero inflated count outcome
 set.seed(50)
 n=1000
 dat <- data.frame(y= (yany <- rbinom(n, 1, 0.5))*(ycnt <- rpois(n, 1.2)), x1=runif(n)+ycnt*0.2, 
                       x2=runif(n)-ycnt*0.2, x3=runif(n),
                      x4=runif(n) , z=runif(n))
 # poisson count model, mixture in both portions, but note that the qgcomp.partials
 # function defines the "positive" variables only  by the count portion of the model
 (overall5 <- qgcomp.zi.noboot(f=y ~ z + x1 + x2 + x3 + x4 | x1 + x2 + x3 + x4 + z, 
                 expnms = c("x1", "x2", "x3", "x4"), 
                  data=dat, q=4, dist="poisson"))

trainidx5 <- sample(1:nrow(dat), round(nrow(dat)*0.6))
valididx5 <- setdiff(1:nrow(dat),trainidx5)
traindata5 <- dat[trainidx5,]
validdata5 <- dat[valididx5,]
splitres5 <- qgcomp.partials(fun="qgcomp.zi.noboot", 
    f=y ~ x1 + x2 + x3 + x4 + z | x1 + x2 + x3 + x4 + z, q=4, 
    traindata=traindata5, validdata=validdata5, 
    expnms=c("x1", "x2", "x3", "x4"))
splitres5
                
}