stdGee
performs regression standardization in linear and log-linear
fixed effects models, at specified values of the exposure, over the sample
covariate distribution. Let \(Y\), \(X\), and \(Z\) be the outcome,
the exposure, and a vector of covariates, respectively. It is assumed that data
are clustered with a cluster indicator \(i\). stdGee
uses
fitted fixed effects model, with cluster-specific intercept \(a_i\)
(see details
), to estimate the standardized mean
\(\theta(x)=E\{E(Y|i,X=x,Z)\}\), where \(x\) is a specific value of \(X\),
and the outer expectation is over the marginal distribution of \((a_i,Z)\).
stdGee(fit, data, X, x, clusterid, subsetnew)
an object of class "gee"
, with argument cond = TRUE
, as returned
by the gee
function in the drgee package. If arguments
weights
and/or subset
are used when fitting the model, then the
same weights and subset are used in stdGee
.
a data frame containing the variables in the model. This should be the same
data frame as was used to fit the model in fit
.
a string containing the name of the exposure variable \(X\) in data
.
an optional vector containing the specific values of \(X\) at which to estimate
the standardized mean. If \(X\) is binary (0/1) or
a factor, then x
defaults to all values of \(X\). If \(X\) is numeric,
then x
defaults to the mean of \(X\). If x
is set to NA
,
then \(X\) is not altered. This produces an estimate of the marginal mean
\(E(Y)=E\{E(Y|X,Z)\}\).
an mandatory string containing the name of a cluster identification variable. Must be identical to the clusterid variable used in the gee call.
an optional logical statement specifying a subset of observations to be used in the standardization. This set is assumed to be a subset of the subset (if any) that was used to fit the regression model.
An object of class "stdGee"
is a list containing
the matched call.
input
is a list containing all input arguments.
a vector with length equal to length(x)
, where element j
is equal to
\(\hat{\theta}\)(x[j]
).
a square matrix with length(x)
rows, where the element
on row i
and column j
is the (estimated) covariance of
\(\hat{\theta}\)(x[i]
) and
\(\hat{\theta}\)(x[j]
).
stdGee
assumes that a fixed effects model
$$\eta\{E(Y|i,X,Z)\}=a_i+h(X,Z;\beta)$$
has been fitted. The link function \(\eta\) is assumed to be the identity link
or the log link. The conditional generalized estimating equation (CGGE)
estimate of \(\beta\) is used to obtain estimates of the cluster-specific
means:
$$\hat{a}_i=\sum_{j=1}^{n_i}r_{ij}/n_i,$$
where
$$r_{ij}=Y_{ij}-h(X_{ij},Z_{ij};\hat{\beta})$$
if \(\eta\) is the identity link, and
$$r_{ij}=Y_{ij}exp\{-h(X_{ij},Z_{ij};\hat{\beta})\}$$
if \(\eta\) is the log link, and \((X_{ij},Z_{ij})\) is the value of
\((X,Z)\) for subject \(j\) in cluster \(i\), \(j=1,...,n_i\),
\(i=1,...,n\). The CGEE estimate of \(\beta\) and the estimate of
\(a_i\) are used to estimate the mean \(E(Y|i,X=x,Z)\):
$$\hat{E}(Y|i,X=x,Z)=\eta^{-1}\{\hat{a}_i+h(X=x,Z;\hat{\beta})\}.$$
For each \(x\) in the x
argument, these estimates are averaged across
all subjects (i.e. all observed values of \(Z\) and all estimated values of
\(a_i\)) to produce estimates
$$\hat{\theta}(x)=\sum_{i=1}^n \sum_{j=1}^{n_i} \hat{E}(Y|i,X=x,Z_i)/N,$$
where \(N=\sum_{i=1}^n n_i\). The variance for \(\hat{\theta}(x)\) is
obtained by the sandwich formula.
Goetgeluk S. and Vansteelandt S. (2008). Conditional generalized estimating equations for the analysis of clustered and longitudinal data. Biometrics 64(3), 772-780.
Martin R.S. (2017). Estimation of average marginal effects in multiplicative unobserved effects panel models. Economics Letters 160, 16-19.
Sjolander A. (2019). Estimation of marginal causal effects in the presence of confounding by cluster. Biostatistics doi: 10.1093/biostatistics/kxz054
# NOT RUN {
require(drgee)
n <- 1000
ni <- 2
id <- rep(1:n, each=ni)
ai <- rep(rnorm(n), each=ni)
Z <- rnorm(n*ni)
X <- rnorm(n*ni, mean=ai+Z)
Y <- rnorm(n*ni, mean=ai+X+Z+0.1*X^2)
dd <- data.frame(id, Z, X, Y)
fit <- gee(formula=Y~X+Z+I(X^2), data=dd, clusterid="id", link="identity",
cond=TRUE)
fit.std <- stdGee(fit=fit, data=dd, X="X", x=seq(-3,3,0.5), clusterid="id")
print(summary(fit.std, contrast="difference", reference=2))
plot(fit.std)
# }
Run the code above in your browser using DataLab