drgee: Doubly Robust Generalized Estimating Equations

Description

drgee performs outcome nuisance model based estimation, exposure nuisance model based estimation or doubly robust estimation given symbolic representations of an outcome nuisance model and an exposure nuisance model.

Usage

drgee(oformula, eformula, iaformula = formula(~1),
      olink = c("identity", "log", "logit"),
      elink = c("identity", "log", "logit"),
      estimationMethod = c("dr", "obe", "ebe"),
      data = NULL, rootFinder = findRoots,
      clusterid = NULL, ...)

Arguments

oformula

An expression or formula for the outcome nuisance model. The outcome is identified as the response in this formula. Therefore, a LHS is required for all choices for estimationMethod.

eformula

An expression or formula for the exposure nuisance model. The exposure is identified as the response in this formula. Therefore, a LHS is required for all choices for estimationMethod.

iaformula

An expression or formula where the RHS should contain the variables that "interact" (i.e. are supposed to be multiplied with) with the exposure in the main model. "1" will always added. Default value is no interactions, i.e. iaformula=formul

olink

A character string naming the link function in the outcome nuisance model. Has to be "identity", "log" or "logit". Default is "identity".

elink

A character string naming the link function in the exposure nuisance model. Has to be "identity", "log" or "logit". Default is "identity".

estimationMethod

A character string naming the desired estimation method. Choose "obe" for outcome nuisance model based estimation, "ebe" for exposure nuisance model based estimation or "dr" for doubly robust estimation. Defaul

data

A data frame or environment containing the variables appearing in iaformula, oformula and eformula. Default is NULL in which case data are expected to be found in the environment of the

rootFinder

A function to solve a system of non linear equations. Default is findRoots.

clusterid

A optional character string naming a cluster-defining variable in the data argument.

...

Further arguments to be passed to the functions geeFit and rootFinder.

Value

drgee return an object of class drgee containing:
coefficientsEstimates of the parameters in the main model.
vcovRobust variance of the parameter estimates.
optim.objectAn estimation object returned from the function specified in the rootFinder, if this function is called.
callThe matched call.
geeDataThe geeData object used in the calculations.
estimationMethodThe value of the input argument estimationMethod.
The class methods coef and vcov can be used to extract the estimated parameters and their covariance matrix from a drgee object. summary.drgee produces a summary of the calculations.

encoding

latin1

Details

drgee estimates the parameter $\beta$ in a main model $g{E(Y|A,L)}=\beta^T {A\cdot X(L)}+Q(L)$, where $L$ is a vector of nuisance variables and $X(L)$ and $Q(L)$ are functions of $L$. Note that $A \cdot X(L)$ should be interpreted as a columnwise multiplication and that $X(L)$ will always contain a column of 1's. Given a specification of an outcome nuisance model $Q(L)=\gamma^T V(L)$ (where $V(L)$ is a function of $L$) outcome nuisance model based estimation can be performed. Alternatively, leaving $Q(L)$ unspecified and using an exposure nuisance model $h{E(A|L)}=\alpha^T Z(L)$ (where $h$ is a link function and $Z(L)$ is a function of $L$) exposure nuisance model based estimation can be performed. When $g$ is $logit$, the exposure nuisance model is required be of the form $logit{E(A|Y=0,L)}=\alpha^T Z(L)$. In this case the exposure needs to binary. Given both an outcome and an exposure nuisance model, doubly robust estimation can be performed. Doubly robust estimation gives a consistent estimate of the parameter $\beta$ when either the outcome nuisance model or the exposure nuisance model is correctly specified, not necessarily both.

Usage is best explained through an example. Suppose that we are interested in the parameter vector $\beta_0$ and $\beta_1$ in a main model $logit{E(Y|A,L_1,L_2)}=\beta_0 A + \beta_1 A \cdot L_1 + Q(L_1,L_2)$ where $L_1$ and $L_2$ are nuisance variables and $Q(L_1,L_2)$ is some (unspecified) function of $L_1$ and $L_2$.

To adjust for $L_1$ and $L_2$, we can use an outcome nuisance model $Q(L_1,L_2;\gamma)=\gamma_0 + \gamma_1 L_1$ or an exposure nuisance model $logit{E(A|Y=0,L_1,L_2)}=\alpha_0+\alpha_1 L1+\alpha_2 L2$ to calculate estimates of $\beta_0$ and $\beta_1$ in the main model.

We specify the outcome nuisance model as oformula=Y~L_1 and olink="logit". The exposure nuisance model is specified as eformula=A~L_1+L_2 and elink="logit". Since the outcome $Y$ and the exposure $A$ are identified as the LHS of oformula and eformla respectively and since the outcome link is specified in the olink argument, the only thing left to specify for the main model is the (multiplicative) interactions $X(L)=(1,L_1)^T$. This is done as iaformula=~L_1, since $1$ is always included in $X(L)$. We can then perform outcome or exposure nuisance model based estimation or doubly robust estimation by setting estimationMethod to "obe", "ebe" or "dr" respectively.

When estimationMethod="obe", the RHS of eformula will be ignored with a warning message.

When estimationMethod="ebe", the RHS of oformula will be ignored with a warning message.

Outcome nuisance model based estimation is implemented for generalized estimating equation models with the identity, log or logit link and independent observations. The estimated coefficients are identical to those obtained with glm, but since no distributional assumptions are made, robust variance is calculated.

When exposure nuisance model based estimation or doubly robust estimation estimation is chosen with olink="logit" the exposure link will be changed to "logit" with a warning message.

Robust variance for the estimated parameter is calculated using robVcov. A cluster robust variance is calculated when a character string naming a cluster variable is supplied in the clusterid argument.

drgee calls geeData to create a geeData object containing the elements needed in the calculations. The estimation of the coefficents in the main model is performed by obeFit, ebeFit or drFit.

For exposure nuisance model based estimation when $g$ is the identity or log link, see Robins et al. (1992).

For doubly robust estimation when $g$ is the identity or log link, see Robins (1999). For doubly robust estimation when $g$ is the logit link, see Tchetgen et al. (2010).

This package was inspired by the STATA package drglm described in Orsini et al. (2013). Basically, it also provides the same functionality.

References

Orsini N., Belocco R., Sj�lander{Sjolander} A. (2013), Doubly Robust Estimation in Generalized Linear Models, Stata Journal, 13, 1, pp.185-205

Robins J.M., Mark S.D., Newey W.K. (1992), Estimating Exposure Effects by Modelling the Expectation of Exposure Conditional on Confounders, Biometrics, 48, 479--495

Robins JM (1999), Robust Estimation in Sequentially Ignorable Missing Data and Causal Inference Models, Proceedings of the American Statistical Association Section on Bayesian Statistical Science, pp. 6--10

Tchetgen E.J.T., Robins J.M., Rotnitzky A. (2010), On Doubly Robust Estimation in a Semiparametric Odds Ratio Model, Biometrika, 97, 1, 171--180

Examples

Run this code

## Doubly robust estimation when
## the main model is
## E(Y|A,L1,L2)-E(Y|A=0,L1,L2)=beta0*A+beta1*A*L1
## and the outcome nuisance model is
## E(Y|A=0,L1,L2)=gamma0+gamma1*L1+gamma2*L2
## and the exposure nuisance model is
## E(A|Y=0,L1,L2)=expit(alpha0+alpha1*L1+alpha2*l2)

library(drgee)

expit<-function(x) exp(x)/(1+exp(x))

n<-5000

# nuisance
l1<-rnorm(n, mean = 0, sd = 1)
l2<-rnorm(n, mean = 0, sd = 1)

beta0<-1.5
beta1<-1
gamma0<--1
gamma1<--2
gamma2<-2
alpha0<-1
alpha1<-5
alpha2<-3

# Exposure
a<-rbinom(n,1,expit(alpha0 + alpha1*l1 + alpha2*l2))
# Outcome
y<-rnorm(n,beta0*a + beta1*a*l1 + gamma0 + gamma1*l1 + gamma2*l2,sd=1)

data<-data.frame(y,a,l1,l2)

## outcome nuisance model misspecified and
## exposure nuisance model correctly specified

# Doubly robust estimation
dr.est <- drgee(y~l1,a~l1+l2,~l1,"identity","logit","dr",data)
summary(dr.est)

# Outcome nuisance model based estimation
obe.est <- drgee(y~l1,a~1,~l1,"identity","logit","obe",data)
summary(obe.est)

# Exposure based estimation
ebe.est <- drgee(y~1,a~l1+l2,~l1,"identity","logit","ebe",data)
summary(ebe.est)

Run the code above in your browser using DataLab