stdGee: Regression standardization in conditional generalized estimating equations

Description

stdGee performs regression standardization in linear and log-linear fixed effects models, at specified values of the exposure, over the sample covariate distribution. Let $Y$, $X$, and $Z$ be the outcome, the exposure, and a vector of covariates, respectively. It is assumed that data are clustered with a cluster indicator $i$. stdGee uses fitted fixed effects model, with cluster-specific intercept $a_i$ (see details), to estimate the standardized mean $\theta(x)=E\{E(Y|i,X=x,Z)\}$, where $x$ is a specific value of $X$, and the outer expectation is over the marginal distribution of $(a_i,Z)$.

Usage

stdGee(fit, data, X, x, clusterid, subsetnew)

Arguments

fit

an object of class "gee", with argument cond = TRUE, as returned by the gee function in the drgee package. If arguments weights and/or subset are used when fitting the model, then the same weights and subset are used in stdGee.

data

a data frame containing the variables in the model. This should be the same data frame as was used to fit the model in fit.

a string containing the name of the exposure variable $X$ in data.

an optional vector containing the specific values of $X$ at which to estimate the standardized mean. If $X$ is binary (0/1) or a factor, then x defaults to all values of $X$. If $X$ is numeric, then x defaults to the mean of $X$. If x is set to NA, then $X$ is not altered. This produces an estimate of the marginal mean $E(Y)=E\{E(Y|X,Z)\}$.

clusterid

an mandatory string containing the name of a cluster identification variable. Must be identical to the clusterid variable used in the gee call.

subsetnew

an optional logical statement specifying a subset of observations to be used in the standardization. This set is assumed to be a subset of the subset (if any) that was used to fit the regression model.

Value

An object of class "stdGee" is a list containing

call

the matched call.

input

input is a list containing all input arguments.

est

a vector with length equal to length(x), where element j is equal to $\hat{\theta}$(x[j]).

vcov

a square matrix with length(x) rows, where the element on row i and column j is the (estimated) covariance of $\hat{\theta}$(x[i]) and $\hat{\theta}$(x[j]).

Details

stdGee assumes that a fixed effects model $$\eta\{E(Y|i,X,Z)\}=a_i+h(X,Z;\beta)$$ has been fitted. The link function $\eta$ is assumed to be the identity link or the log link. The conditional generalized estimating equation (CGGE) estimate of $\beta$ is used to obtain estimates of the cluster-specific means: $$\hat{a}_i=\sum_{j=1}^{n_i}r_{ij}/n_i,$$ where $$r_{ij}=Y_{ij}-h(X_{ij},Z_{ij};\hat{\beta})$$ if $\eta$ is the identity link, and $$r_{ij}=Y_{ij}exp\{-h(X_{ij},Z_{ij};\hat{\beta})\}$$ if $\eta$ is the log link, and $(X_{ij},Z_{ij})$ is the value of $(X,Z)$ for subject $j$ in cluster $i$, $j=1,...,n_i$, $i=1,...,n$. The CGEE estimate of $\beta$ and the estimate of $a_i$ are used to estimate the mean $E(Y|i,X=x,Z)$: $$\hat{E}(Y|i,X=x,Z)=\eta^{-1}\{\hat{a}_i+h(X=x,Z;\hat{\beta})\}.$$ For each $x$ in the x argument, these estimates are averaged across all subjects (i.e. all observed values of $Z$ and all estimated values of $a_i$) to produce estimates $$\hat{\theta}(x)=\sum_{i=1}^n \sum_{j=1}^{n_i} \hat{E}(Y|i,X=x,Z_i)/N,$$ where $N=\sum_{i=1}^n n_i$. The variance for $\hat{\theta}(x)$ is obtained by the sandwich formula.

References

Goetgeluk S. and Vansteelandt S. (2008). Conditional generalized estimating equations for the analysis of clustered and longitudinal data. Biometrics 64(3), 772-780.

Martin R.S. (2017). Estimation of average marginal effects in multiplicative unobserved effects panel models. Economics Letters 160, 16-19.

Sjolander A. (2019). Estimation of marginal causal effects in the presence of confounding by cluster. Biostatistics doi: 10.1093/biostatistics/kxz054

Examples

Run this code

# NOT RUN {
require(drgee)

n <- 1000
ni <- 2
id <- rep(1:n, each=ni)
ai <- rep(rnorm(n), each=ni)
Z <- rnorm(n*ni)
X <- rnorm(n*ni, mean=ai+Z)
Y <- rnorm(n*ni, mean=ai+X+Z+0.1*X^2)
dd <- data.frame(id, Z, X, Y)
fit <- gee(formula=Y~X+Z+I(X^2), data=dd, clusterid="id", link="identity",
  cond=TRUE)
fit.std <- stdGee(fit=fit, data=dd, X="X", x=seq(-3,3,0.5), clusterid="id")
print(summary(fit.std, contrast="difference", reference=2))
plot(fit.std)

# }

Run the code above in your browser using DataLab