drgee
performs outcome nuisance model based estimation,
exposure nuisance model based estimation or doubly robust estimation
given symbolic representations of an outcome nuisance model and
an exposure nuisance model.drgee(oformula, eformula, iaformula = formula(~1),
olink = c("identity", "log", "logit"),
elink = c("identity", "log", "logit"),
estimationMethod = c("dr", "obe", "ebe"),
data = NULL, rootFinder = findRoots,
clusterid = NULL, ...)
estimationMethod
.estimationMethod
.iaformula=formul
"identity"
, "log"
or
"logit"
. Default is "identity"
."identity"
, "log"
or
"logit"
. Default is "identity"
."obe"
for outcome nuisance model based estimation,
"ebe"
for exposure nuisance model based estimation or
"dr"
for doubly robust estimation. Defauliaformula
,
oformula
and eformula
. Default is NULL
in which
case data are expected to be found in the environment of the
findRoots
.data
argument.geeFit
and rootFinder
.drgee
return an object of class drgee
containing:rootFinder
, if this function is called.geeData
object used in the calculations.estimationMethod
.coef
and vcov
can be used to extract
the estimated parameters and their covariance matrix from a
drgee
object. summary.drgee
produces a summary of the
calculations.drgee
estimates the parameter $\beta$ in a main
model $g{E(Y|A,L)}=\beta^T {A\cdot X(L)}+Q(L)$,
where $L$ is a vector of nuisance variables and
$X(L)$ and $Q(L)$ are functions of $L$. Note that $A
\cdot X(L)$ should be interpreted as a columnwise
multiplication and that $X(L)$ will always contain a column of 1's.
Given a specification of an outcome nuisance model $Q(L)=\gamma^T
V(L)$ (where $V(L)$ is a function of $L$)
outcome nuisance model based estimation can be
performed. Alternatively, leaving $Q(L)$ unspecified and
using an exposure nuisance model $h{E(A|L)}=\alpha^T
Z(L)$ (where $h$ is a link
function and $Z(L)$ is a function of $L$) exposure
nuisance model based estimation can be performed. When $g$ is
$logit$, the exposure nuisance model is required be
of the form $logit{E(A|Y=0,L)}=\alpha^T Z(L)$.
In this case the exposure needs to binary. Given both
an outcome and an exposure nuisance model, doubly robust
estimation can be performed. Doubly robust estimation
gives a consistent estimate of the parameter $\beta$ when
either the outcome nuisance model or the exposure nuisance model
is correctly specified, not necessarily both.Usage is best explained through an example. Suppose that we are interested in the parameter vector $\beta_0$ and $\beta_1$ in a main model $logit{E(Y|A,L_1,L_2)}=\beta_0 A + \beta_1 A \cdot L_1 + Q(L_1,L_2)$ where $L_1$ and $L_2$ are nuisance variables and $Q(L_1,L_2)$ is some (unspecified) function of $L_1$ and $L_2$.
To adjust for $L_1$ and $L_2$, we can use an outcome nuisance model $Q(L_1,L_2;\gamma)=\gamma_0 + \gamma_1 L_1$ or an exposure nuisance model $logit{E(A|Y=0,L_1,L_2)}=\alpha_0+\alpha_1 L1+\alpha_2 L2$ to calculate estimates of $\beta_0$ and $\beta_1$ in the main model.
We specify the outcome nuisance model as oformula=Y~L_1
and olink="logit"
. The exposure nuisance model is specified as
eformula=A~L_1+L_2
and elink="logit"
.
Since the outcome $Y$ and the exposure $A$ are
identified as the LHS of oformula
and eformla
respectively and since the outcome link is specified in the
olink
argument,
the only thing left to specify for the main model is the
(multiplicative) interactions $X(L)=(1,L_1)^T$. This is done as
iaformula=~L_1
, since $1$ is always included in $X(L)$.
We can then perform outcome or exposure nuisance model based
estimation or doubly robust estimation by setting
estimationMethod
to "obe"
, "ebe"
or "dr"
respectively.
When estimationMethod="obe"
, the RHS of eformula
will be ignored
with a warning message.
When estimationMethod="ebe"
, the RHS of oformula
will be ignored with a warning message.
Outcome nuisance model based estimation is implemented for generalized
estimating equation models with the identity, log or logit link and
independent observations. The estimated coefficients are identical to
those obtained with glm
, but since no
distributional assumptions are made, robust variance is calculated.
When exposure nuisance model based estimation or doubly robust estimation
estimation is chosen with olink="logit"
the exposure link will be
changed to "logit"
with a warning message.
Robust variance for the estimated parameter is calculated
using robVcov
. A cluster robust variance is calculated when
a character string naming a cluster variable is
supplied in the clusterid
argument.
drgee
calls geeData
to create a geeData
object
containing the elements needed in the calculations. The estimation of
the coefficents in the main model is performed by obeFit
,
ebeFit
or drFit
.
For exposure nuisance model based estimation when $g$ is the identity or log link, see Robins et al. (1992).
For doubly robust estimation when $g$ is the identity or log link, see Robins (1999). For doubly robust estimation when $g$ is the logit link, see Tchetgen et al. (2010).
This package was inspired by the STATA package drglm
described
in Orsini et al. (2013). Basically, it also provides the same functionality.
Robins J.M., Mark S.D., Newey W.K. (1992), Estimating Exposure Effects by Modelling the Expectation of Exposure Conditional on Confounders, Biometrics, 48, 479--495
Robins JM (1999), Robust Estimation in Sequentially Ignorable Missing Data and Causal Inference Models, Proceedings of the American Statistical Association Section on Bayesian Statistical Science, pp. 6--10
Tchetgen E.J.T., Robins J.M., Rotnitzky A. (2010), On Doubly Robust Estimation in a Semiparametric Odds Ratio Model, Biometrika, 97, 1, 171--180
obeFit
for outcome nuisance model based estimation,
ebeFit
for exposure nuisance model based
estimation, drFit
for doubly robust estimation
of the parameters in the main model, drgeeData
for data
preparation and findRoots
for nonlinear equation
solving, robVcov
for estimation of variance.## Doubly robust estimation when
## the main model is
## E(Y|A,L1,L2)-E(Y|A=0,L1,L2)=beta0*A+beta1*A*L1
## and the outcome nuisance model is
## E(Y|A=0,L1,L2)=gamma0+gamma1*L1+gamma2*L2
## and the exposure nuisance model is
## E(A|Y=0,L1,L2)=expit(alpha0+alpha1*L1+alpha2*l2)
library(drgee)
expit<-function(x) exp(x)/(1+exp(x))
n<-5000
# nuisance
l1<-rnorm(n, mean = 0, sd = 1)
l2<-rnorm(n, mean = 0, sd = 1)
beta0<-1.5
beta1<-1
gamma0<--1
gamma1<--2
gamma2<-2
alpha0<-1
alpha1<-5
alpha2<-3
# Exposure
a<-rbinom(n,1,expit(alpha0 + alpha1*l1 + alpha2*l2))
# Outcome
y<-rnorm(n,beta0*a + beta1*a*l1 + gamma0 + gamma1*l1 + gamma2*l2,sd=1)
data<-data.frame(y,a,l1,l2)
## outcome nuisance model misspecified and
## exposure nuisance model correctly specified
# Doubly robust estimation
dr.est <- drgee(y~l1,a~l1+l2,~l1,"identity","logit","dr",data)
summary(dr.est)
# Outcome nuisance model based estimation
obe.est <- drgee(y~l1,a~1,~l1,"identity","logit","obe",data)
summary(obe.est)
# Exposure based estimation
ebe.est <- drgee(y~1,a~l1+l2,~l1,"identity","logit","ebe",data)
summary(ebe.est)
Run the code above in your browser using DataLab