standardize_gee
performs regression standardization in linear and log-linear
fixed effects models, at specified values of the exposure, over the sample
covariate distribution. Let \(Y\), \(X\), and \(Z\) be the outcome,
the exposure, and a vector of covariates, respectively. It is assumed that
data are clustered with a cluster indicator \(i\). standardize_gee
uses
fitted fixed effects model, with cluster-specific intercept \(a_i\) (see
details
), to estimate the standardized mean
\(\theta(x)=E\{E(Y|i,X=x,Z)\}\), where \(x\) is a specific value of
\(X\), and the outer expectation is over the marginal distribution of
\((a_i,Z)\).
standardize_gee(
formula,
link = "identity",
data,
values,
clusterid,
case_control = FALSE,
ci_level = 0.95,
ci_type = "plain",
contrasts = NULL,
family = "gaussian",
reference = NULL,
transforms = NULL
)
An object of class std_glm
. Obtain numeric results in a data frame with the tidy.std_glm function.
This is a list with the following components:
An unnamed list with one element for each of the requested contrasts. Each element is itself a list with the elements:
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means
The estimated regression model for the outcome
The estimated exposure model
A character vector of the exposure variable names
Data.frame of the estimates of the contrast with inference
The transform argument used for this contrast
The requested contrast type
The reference level of the exposure
Confidence interval type
Confidence interval level
A named list with the elements:
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means
The estimated regression model for the outcome
The estimated exposure model
A character vector of the exposure variable names
A formula to be used with "gee"
in the drgee package.
The link function to be used with "gee"
.
The data.
A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated.
An optional string containing the name of a cluster identification variable when data are clustered.
Whether the data comes from a case-control study.
Coverage probability of confidence intervals.
A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals.
A vector of contrasts in the following format:
If set to "difference"
or "ratio"
, then \(\psi(x)-\psi(x_0)\)
or \(\psi(x) / \psi(x_0)\) are constructed, where \(x_0\) is a reference
level specified by the reference
argument. Has to be NULL
if no references are specified.
The family argument which is used to fit the glm model for the outcome.
A vector of reference levels in the following format:
If contrasts
is not NULL
, the desired reference level(s). This
must be a vector or list the same length as contrasts
, and if not named,
it is assumed that the order is as specified in contrasts.
A vector of transforms in the following format:
If set to "log"
, "logit"
, or "odds"
, the standardized
mean \(\theta(x)\) is transformed into \(\psi(x)=\log\{\theta(x)\}\),
\(\psi(x)=\log[\theta(x)/\{1-\theta(x)\}]\), or
\(\psi(x)=\theta(x)/\{1-\theta(x)\}\), respectively.
If the vector is NULL
, then \(\psi(x)=\theta(x)\).
Arvid Sjölander.
standardize_gee
assumes that a fixed effects model
$$\eta\{E(Y|i,X,Z)\}=a_i+h(X,Z;\beta)$$ has been fitted. The link
function \(\eta\) is assumed to be the identity link or the log link. The
conditional generalized estimating equation (CGEE) estimate of \(\beta\)
is used to obtain estimates of the cluster-specific means:
$$\hat{a}_i=\sum_{j=1}^{n_i}r_{ij}/n_i,$$ where
$$r_{ij}=Y_{ij}-h(X_{ij},Z_{ij};\hat{\beta})$$ if \(\eta\) is the
identity link, and $$r_{ij}=Y_{ij}\exp\{-h(X_{ij},Z_{ij};\hat{\beta})\}$$
if \(\eta\) is the log link, and \((X_{ij},Z_{ij})\) is the value of
\((X,Z)\) for subject \(j\) in cluster \(i\), \(j=1,...,n_i\),
\(i=1,...,n\). The CGEE estimate of \(\beta\) and the estimate of
\(a_i\) are used to estimate the mean \(E(Y|i,X=x,Z)\):
$$\hat{E}(Y|i,X=x,Z)=\eta^{-1}\{\hat{a}_i+h(X=x,Z;\hat{\beta})\}.$$ For
each \(x\) in the x
argument, these estimates are averaged across
all subjects (i.e. all observed values of \(Z\) and all estimated values
of \(a_i\)) to produce estimates $$\hat{\theta}(x)=\sum_{i=1}^n
\sum_{j=1}^{n_i} \hat{E}(Y|i,X=x,Z_i)/N,$$ where \(N=\sum_{i=1}^n n_i\).
The variance for \(\hat{\theta}(x)\) is obtained by the sandwich formula.
Goetgeluk S. and Vansteelandt S. (2008). Conditional generalized estimating equations for the analysis of clustered and longitudinal data. Biometrics 64(3), 772-780.
Martin R.S. (2017). Estimation of average marginal effects in multiplicative unobserved effects panel models. Economics Letters 160, 16-19.
Sjölander A. (2019). Estimation of marginal causal effects in the presence of confounding by cluster. Biostatistics doi: 10.1093/biostatistics/kxz054
require(drgee)
set.seed(4)
n <- 300
ni <- 2
id <- rep(1:n, each = ni)
ai <- rep(rnorm(n), each = ni)
Z <- rnorm(n * ni)
X <- rnorm(n * ni, mean = ai + Z)
Y <- rnorm(n * ni, mean = ai + X + Z + 0.1 * X^2)
dd <- data.frame(id, Z, X, Y)
fit.std <- standardize_gee(
formula = Y ~ X + Z + I(X^2),
link = "identity",
data = dd,
values = list(X = seq(-3, 3, 0.5)),
clusterid = "id"
)
print(fit.std)
plot(fit.std)
Run the code above in your browser using DataLab