qgcomp.glm.ee: Quantile g-computation for continuous and binary outcomes

Description

This function estimates a dose-response parameter representing a one quantile increase in a set of exposures of interest. This model estimates the parameters of a marginal structural model (MSM) based on g-computation with quantized exposures. Note: this function allows non-linear and non-additive effects of individual components of the exposure, as well as non-linear joint effects of the mixture via polynomial basis functions, which increase the computational computational burden due to the need for non-parametric bootstrapping.

Estimating equation methodology is used as the underlying estimation scheme. This allows that observations can be correlated, and is fundementally identical to some implementations of "generalized estimating equations" or GEE. Thus, it allows for a more general set of longitudinal data structures than does the qgcomp.glm.noboot function, because it allows that the outcome can vary over time within an individual. Interpretation of parameters is similar to that of a GEE: this function yields population average estimates of the effect of exposure over time.

Note: GEE (and this function, by extension) does not automatically address all problems of longitudinal data, such as lag structures. Those are up to the investigator to specify correctly.

Note: qgcomp.ee is an equivalent function (slated for deprecation)

Usage

qgcomp.glm.ee(
  f,
  data,
  expnms = NULL,
  q = 4,
  breaks = NULL,
  id = NULL,
  weights,
  offset = NULL,
  alpha = 0.05,
  rr = TRUE,
  degree = 1,
  seed = NULL,
  includeX = TRUE,
  verbose = TRUE,
  ...
)
qgcomp.ee(
  f,
  data,
  expnms = NULL,
  q = 4,
  breaks = NULL,
  id = NULL,
  weights,
  offset = NULL,
  alpha = 0.05,
  rr = TRUE,
  degree = 1,
  seed = NULL,
  includeX = TRUE,
  verbose = TRUE,
  ...
)

Value

a qgcompfit object, which contains information about the effect measure of interest (psi) and associated variance (var.psi), as well as information on the model fit (fit) and information on the marginal structural model (msmfit) used to estimate the final effect estimates.

Arguments

f: R style formula
data: data frame
expnms: character vector of exposures of interest
q: NULL or number of quantiles used to create quantile indicator variables representing the exposure variables. If NULL, then gcomp proceeds with un-transformed version of exposures in the input datasets (useful if data are already transformed, or for performing standard g-computation)
breaks: (optional) NULL, or a list of (equal length) numeric vectors that characterize the minimum value of each category for which to break up the variables named in expnms. This is an alternative to using 'q' to define cutpoints.
id: (optional) NULL, or variable name indexing individual units of observation (only needed if analyzing data with multiple observations per id/cluster). Note that qgcomp.glm.noboot will not produce cluster-appropriate standard errors. qgcomp.glm.ee can be used for this, which will use bootstrap sampling of clusters/individuals to estimate cluster-appropriate standard errors via bootstrapping.
weights: NULL or "case weights" - sampling weights representing a proportional representation in the analysis data
offset: Not yet implemented glm or bayesglm
alpha: alpha level for confidence limit calculation
rr: logical: if using binary outcome and rr=TRUE, qgcomp.glm.ee will estimate risk ratio rather than odds ratio
degree: polynomial bases for marginal model (e.g. degree = 2 allows that the relationship between the whole exposure mixture and the outcome is quadratic (default = 1).
seed: integer or NULL: random number seed for replicable bootstrap results
includeX: (logical) should the design/predictor matrix be included in the output via the fit and msmfit parameters?
verbose: (logical) give extra messages about fits
...: arguments to glm (e.g. family)

Details

Estimates correspond to the average expected change in the (log) outcome per quantile increase in the joint exposure to all exposures in `expnms'. Test statistics and confidence intervals are based on a cluster-robust variance that is available in estimating equation methods, which are sometimes referred to as "M-estimators." The robust standard error is then used to estimate Wald-type confidence intervals. Note that no error is assumed for estimated quantiles of exposure, so these are treated as fixed quantities

There is an argument to "family" that is optional, but it defaults to gaussian. Current options allow "gaussian" (or gaussian()), "poisson" or "binomial." This function defaults to canonical links (binomial -> logistic link, gaussian -> identity link). However, setting family="binomial" and rr=TRUE uses a marginal structural model with a log-link (and an underlying conditional model with a logistic link). This mimics the behavior of qgcomp.glm.boot which maximizes backwards compatibility but also is a useful default to avoid convergence issues when using the log-link for conditional models. See examples below.

Examples

Run this code

set.seed(30)
# continuous outcome
dat <- data.frame(y=rnorm(100), x1=runif(100), x2=runif(100), z=runif(100))
# Conditional linear slope
qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian())
# Marginal linear slope (population average slope, for a purely linear,
#  additive model this will equal the conditional)
qgcomp.glm.ee(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4,
  family=gaussian()) 
qgcomp.glm.ee(f=y ~ x1 + x2 + I(x1*x2) + z, expnms = c('x1', 'x2'), data=dat, q=4,
  family=gaussian()) 
# no intercept model
qgcomp.glm.ee(f=y ~ -1+z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4,
  family=gaussian()) 

 # Note that these give different answers! In the first, the estimate is conditional on Z,
 # but in the second, Z is marginalized over via standardization. The estimates
 # can be made approximately the same by centering Z (for linear models), but
 # the conditional estimate will typically have lower standard errors.
 dat$z = dat$z - mean(dat$z)

# Conditional linear slope
qgcomp.glm.noboot(y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4, family=gaussian())
# Marginal linear slope (population average slope, for a purely linear,
#  additive model this will equal the conditional)

qgcomp.glm.ee(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat, q=4,
  family=gaussian()) 
 if (FALSE) {

# Population average mixture slope which accounts for non-linearity and interactions
# Note this is one use case for the estimating equations approach: replacing bootstrapping
qgcomp.glm.ee(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian",
 expnms = c('x1', 'x2'), data=dat, q=4)
qgcomp.glm.boot(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian",
 expnms = c('x1', 'x2'), data=dat, q=4, B=1000)

# generally non-linear/non-addiive underlying models lead to non-linear mixture slopes
dat$y = dat$y + dat$x1*dat$x2
qgcomp.glm.ee(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian",
 expnms = c('x1', 'x2'), data=dat, q=4, deg=2)
qgcomp.glm.boot(y ~ z + x1 + x2 + I(x1^2) + I(x2*x1), family="gaussian",
 expnms = c('x1', 'x2'), data=dat, q=4, deg=2, B=1000)

# binary outcome
dat <- data.frame(y=rbinom(50,1,0.5), x1=runif(50), x2=runif(50), z=runif(50))

# Conditional mixture OR
qgcomp.glm.noboot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'),
  data=dat, q=2)

#Marginal mixture OR (population average OR - in general, this will not equal the
# conditional mixture OR due to non-collapsibility of the OR)
qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'),
  data=dat, q=2, B=300, MCsize=5000, rr=FALSE)
qgcomp.glm.ee(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'),
  data=dat, q=2, rr=FALSE)

# Population average mixture RR
qgcomp.glm.boot(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'),
  data=dat, q=2, B=300, MCsize=5000, rr=TRUE)
qgcomp.glm.ee(y ~ z + x1 + x2, family="binomial", expnms = c('x1', 'x2'),
  data=dat, q=2, rr=TRUE)
# getting the RR with the poisson trick
qgcomp.glm.ee(y ~ z + x1 + x2, family="poisson", expnms = c('x1', 'x2'),
  data=dat, q=2)
qgcomp.glm.boot(y ~ z + x1 + x2, family="poisson", expnms = c('x1', 'x2'),
  data=dat, q=2, B=300, MCsize=5000)

# Population average mixture RR, indicator variable representation of x2
# note that I(x==...) operates on the quantile-based category of x,
# rather than the raw value
res = qgcomp.glm.ee(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3),
  family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE)
res$fit
plot(res)

# now add in a non-linear MSM
res2a = qgcomp.glm.boot(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3),
  family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE,
  degree=2)
res2 = qgcomp.glm.ee(y ~ z + x1 + I(x2==1) + I(x2==2) + I(x2==3),
  family="binomial", expnms = c('x1', 'x2'), data=dat, q=4, rr=TRUE,
  degree=2)
  
# conditional model estimates (identical point estimates and similar std. errors)
summary(res2a$fit)$coefficients
res2$fit
# msm estimates (identical point estimates and different std. errors)
summary(res2a$msmfit)$coefficients  # correct point estimates, incorrect standard errors
res2  # correct point estimates, correct standard errors
plot(res2)
# Log risk ratio per one IQR change in all exposures (not on quantile basis)
dat$x1iqr <- dat$x1/with(dat, diff(quantile(x1, c(.25, .75))))
dat$x2iqr <- dat$x2/with(dat, diff(quantile(x2, c(.25, .75))))
# note that I(x>...) now operates on the untransformed value of x,
# rather than the quantized value
res2 = qgcomp.glm.ee(y ~ z + x1iqr + I(x2iqr>0.1) + I(x2>0.4) + I(x2>0.9),
  family="binomial", expnms = c('x1iqr', 'x2iqr'), data=dat, q=NULL, rr=TRUE,
  degree=2)
res2


# weighted model
N=5000
dat4 <- data.frame(id=seq_len(N), x1=runif(N), x2=runif(N), z=runif(N))
dat4$y <- with(dat4, rnorm(N, x1*z + z, 1))
dat4$w=runif(N) + dat4$z*5
qdata = quantize(dat4, expnms = c("x1", "x2"), q=4)$data
# first equivalent models with no covariates
qgcomp.glm.noboot(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian())
qgcomp.glm.noboot(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(),
              weights=w)

set.seed(13)
qgcomp.glm.ee(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian())
qgcomp.glm.ee(f=y ~ x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(),
            weights=w)
# using the correct model
set.seed(13)
qgcomp.glm.ee(f=y ~ x1*z + x2, expnms = c('x1', 'x2'), data=dat4, q=4, family=gaussian(),
            weights=w, id="id")
(qgcfit <- qgcomp.glm.ee(f=y ~ z + x1 + x2, expnms = c('x1', 'x2'), data=dat4, q=4,
                       family=gaussian(), weights=w))
qgcfit$fit
summary(glm(y ~ z + x1 + x2, data = qdata, weights=w))
}