stdGlm: Regression standardization in generalized linear models

Description

stdGlm performs regression standardization in generalized linear models, at specified values of the exposure, over the sample covariate distribution. Let $Y$, $X$, and $Z$ be the outcome, the exposure, and a vector of covariates, respectively. stdGlm uses a fitted generalized linear model to estimate the standardized mean $\theta(x)=E{E(Y|X=x,Z)}$, where $x$ is a specific value of $X$, and the outer expectation is over the marginal distribution of $Z$.

Usage

stdGlm(fit, data, X, x, clusters, case.control = FALSE)

Arguments

fit

an object of class "glm", as returned by the glm function in the stats package.

data

a data frame containing the variables in the model. This should be the same data frame as was used to fit the model in fit.

a string containing the name of the exposure variable $X$ in data.

an optional vector containing the specific values of $X$ at which to estimate the standardized mean. If $X$ is binary (0/1) or a factor, then x defaults to all values of $X$. If $X$ is numeric, then x defaults to the mean of

clusters

an optional string containing the name of a cluster identification variable when data are clustered.

case.control

logical. Do data come from a case-control study? Defaults to FALSE.

Value

An object of class "stdGlm" is a list containing all input arguments to the stdGlm function, except data. In addition the list contains:
fitthe fitted glm object.
means.esta vector with length equal to length(x), where element j is equal to $\hat{\theta}$(x[j]).
means.vcova square matrix with length(x) rows, where the element on row i and column j is the (estimated) covariance of $\hat{\theta}$(x[i]) and $\hat{\theta}$(x[j]).

Details

stdGlm assums that a generalized linear model $$\eta{E(Y|X,Z)}=h(X,Z;\beta)$$ has been fitted. The maximum likelihood estimate of $\beta$ is used to obtain estimates of the mean $E(Y|X=x,Z)$: $$\hat{E}(Y|X=x,Z)=\eta^{-1}{h(X=x,Z;\hat{\beta})}.$$ For each $x$ in the x argument, these estimates are averaged across all subjects (i.e. all observed values of $Z$) to produce estimates $$\hat{\theta}(x)=\sum_{i=1}^n \hat{E}(t|X=x,Z_i)/n.$$ The variance for $\hat{\theta}(x)$ is obtained by the sandwich formula.

References

Rothman K.J., Greenland S., Lash T.L. (2008). Modern Epidemiology, 3rd edition. Lippincott, Williams & Wilkins.

Examples

Run this code

##Example 1: continuous outcome
n <- 1000
Z <- rnorm(n)
X <- rnorm(n, mean = Z)
Y <- rnorm(n, mean = X + Z + 0.1 * X^2)
dd <- data.frame(Z,X,Y)
fit <- glm(formula = Y ~ X + Z + I(X^2), data = dd)
fit.std <- stdGlm(fit = fit, data = dd, X = "X", x = seq(-3,3,0.5))
print(summary(fit.std))
plot(fit.std)

##Example 2: binary outcome
n <- 1000
Z <- rnorm(n)
X <- rnorm(n, mean = Z)
Y <- rbinom(n, 1, prob = (1 + exp(X + Z))^(-1))
dd <- data.frame(Z,X,Y)
fit <- glm(formula = Y ~ X + Z + X*Z, family = "binomial", data = dd)
fit.std <- stdGlm(fit = fit, data = dd, X = "X", x = seq(-3,3,0.5))
print(summary(fit.std))
plot(fit.std)

Run the code above in your browser using DataLab