geese: Function to solve a Generalized Estimating Equation Model

Description

Produces an object of class `geese' which is a Generalized Estimating Equation fit of the data.

Usage

geese(formula = formula(data), sformula = ~1, id, waves = NULL,
      data = parent.frame(), subset = NULL, na.action = na.omit,
      contrasts = NULL, weights = NULL, zcor = NULL, corp = NULL,
      control = geese.control(...), b = NULL, alpha = NULL, gm = NULL,
      family = gaussian(), mean.link = NULL, variance = NULL,
      cor.link = "identity", sca.link = "identity", link.same = TRUE,
      scale.fix = FALSE, scale.value = 1, corstr = "independence", ...)

geese.fit(x, y, id, offset = rep(0, N), soffset = rep(0, N),
          weights = rep(1,N), waves = NULL, zsca = matrix(1, N, 1),
          zcor = NULL, corp = NULL, control = geese.control(...),
          b = NULL, alpha = NULL, gm = NULL, family = gaussian(),
          mean.link = NULL, variance = NULL, cor.link = "identity",
          sca.link = "identity", link.same = TRUE, scale.fix = FALSE,
          scale.value = 1, corstr = "independence", ...)

Arguments

formula

a formula expression as for glm, of the form response ~ predictors. See the documentation of lm and formula for details. As for glm, this specifies the linear predictor for modeling the mean. A term of the form

sformula

a formula expression of the form ~ predictor, the response being ignored. This specifies the linear predictor for modeling the dispersion. A term of the form offset(expression) is allowed.

a vector which identifies the clusters. The length of `id' should be the same as the number of observations. Data are assumed to be sorted so that observations on a cluster are contiguous rows for all entities in the formula.

waves

an integer vector which identifies components in clusters. The length of waves should be the same as the number of observation. components with the same waves value will have the same link functions.

data

an optional data frame in which to interpret the variables occurring in the formula, along with the id and n variables.

subset

expression saying which subset of the rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), or a numeric vector indicating which observation numbe

na.action

a function to filter missing data. For gee only na.omit should be used here.

contrasts

a list giving contrasts for some or all of the factors appearing in the model formula. The elements of the list should have the same name as the variable and should be either a contrast matrix (specifically, any full-rank matrix with as ma

weights

an optional vector of weights to be used in the fitting process. The length of weights should be the same as the number of observations. This weights is not (yet) the weight as in sas proc genmod, and hence is not recommended

zcor

a design matrix for correlation parameters.

corp

known parameters such as coordinates used for correlation coefficients.

control

a list of iteration and algorithmic constants. See geese.control for their names and default values. These can also be set as arguments to geese itself.

an initial estimate for the mean parameters.

alpha

an initial estimate for the correlation parameters.

an initial estimate for the scale parameters.

family

a description of the error distribution and link function to be used in the model, as for glm.

mean.link

a character string specifying the link function for the means. The following are allowed: "identity", "logit", "probit", "cloglog", "log", and "inverse". The

variance

a character string specifying the variance function in terms of the mean. The following are allowed: "gaussian", "binomial", "poisson", and "gamma". The default value is determined from fa

cor.link

a character string specifying the link function for the correlation coefficients. The following are allowed: "identity", and "fisherz".

sca.link

a character string specifying the link function for the scales. The following are allowed: "identity", and "log".

link.same

a logical indicating if all the components in a cluster should use the same link.

scale.fix

a logical variable; if true, the scale parameter is fixed at the value of scale.value.

scale.value

numeric variable giving the value to which the scale parameter should be fixed; used only if scale.fix == TRUE.

corstr

a character string specifying the correlation structure. The following are permitted: "independence", "exchangeable", "ar1", "unstructured", "userdefined", and

x, y

x is a design matrix of dimension n * p, and y is a vector of observations of length n.

offset, soffset

vector of offset for the mean and for the scale, respectively.

zsca

a design matrix of dimension n * r for the scales.

...

further arguments passed to or from other methods.

Value

An object of class "geese" representing the fit.

Details

when the correlation structure is fixed, the specification of Zcor should be a vector of length sum(clusz * (clusz - 1)) / 2.

References

Yan, J. and J.P. Fine (2004) Estimating Equations for Association Structures. Statistics in Medicine, 23, 859--880.

Examples

Run this code

data(seizure)
## Diggle, Liang, and Zeger (1994) pp166-168, compare Table 8.10
seiz.l <- reshape(seizure,
                  varying=list(c("base","y1", "y2", "y3", "y4")),
                  v.names="y", times=0:4, direction="long")
seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),]
seiz.l$t <- ifelse(seiz.l$time == 0, 8, 2)
seiz.l$x <- ifelse(seiz.l$time == 0, 0, 1)
m1 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id,
            data=seiz.l, corstr="exch", family=poisson)
summary(m1)
m2 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id,
            data = seiz.l, subset = id!=49,
            corstr = "exch", family=poisson)
summary(m2)
## Using fixed correlation matrix
cor.fixed <- matrix(c(1, 0.5, 0.25, 0.125, 0.125,
                      0.5, 1, 0.25, 0.125, 0.125,
                      0.25, 0.25, 1, 0.5, 0.125,
                      0.125, 0.125, 0.5, 1, 0.125,
                      0.125, 0.125, 0.125, 0.125, 1), 5, 5)
cor.fixed
zcor <- rep(cor.fixed[lower.tri(cor.fixed)], 59)
m3 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id,
            data = seiz.l, family = poisson,
            corstr = "fixed", zcor = zcor)
summary(m3)

data(ohio)
fit <- geese(resp ~ age + smoke + age:smoke, id=id, data=ohio,
             family=binomial, corstr="exch", scale.fix=TRUE)
summary(fit)
fit.ar1 <- geese(resp ~ age + smoke + age:smoke, id=id, data=ohio,
                 family=binomial, corstr="ar1", scale.fix=TRUE)
summary(fit.ar1)

###### simulated data
## a function to generate a dataset
gendat <- function() {
  id <- gl(50, 4, 200)
  visit <- rep(1:4, 50)
  x1 <- rbinom(200, 1, 0.6) ## within cluster varying binary covariate
  x2 <- runif(200, 0, 1)   ## within cluster varying continuous covariate
  phi <- 1 + 2 * x1         ## true scale model
  ## the true correlation coefficient rho for an ar(1)
  ## correlation structure is 0.667.
  rhomat <- 0.667 ^ outer(1:4, 1:4, function(x, y) abs(x - y))
  chol.u <- chol(rhomat)
  noise <- as.vector(sapply(1:50, function(x) chol.u %*% rnorm(4)))
  e <- sqrt(phi) * noise
  y <- 1 + 3 * x1 - 2 * x2 + e
  dat <- data.frame(y, id, visit, x1, x2)
  dat
}

dat <- gendat()
fit <- geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1,
             corstr = "ar1", jack = TRUE, j1s = TRUE, fij = TRUE)
summary(fit)


#### create user-defined design matrix of unstrctured correlation.
#### in this case, zcor has 4*3/2 = 6 columns, and 50 * 6 = 300 rows
zcor <- genZcor(clusz = rep(4, 50), waves = dat$visit, "unstr")
zfit <- geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1,
              corstr = "userdefined", zcor = zcor,
              jack = TRUE, j1s = TRUE, fij = TRUE)
summary(zfit)

#### Now, suppose that we want the correlation of 1-2, 2-3, and 3-4
#### to be the same. Then zcor should have 4 columns.
z2 <- matrix(NA, 300, 4)
z2[,1] <- zcor[,1] + zcor[,4] + zcor[,6]
z2[,2:4] <- zcor[, c(2, 3, 5)]
summary(geese(y ~ x1 + x2, id = id, data = dat, sformula = ~ x1,
              corstr = "userdefined", zcor = z2,
              jack = TRUE, j1s = TRUE, fij = TRUE))

#### Next, we introduce non-constant cluster sizes by
#### randomly selecting 60 percent of the data
good <- sort(sample(1:nrow(dat), .6 * nrow(dat))) 
mdat <- dat[good,]

summary(geese(y ~ x1 + x2, id = id, data = mdat, waves = visit,
              sformula = ~ x1, corstr="ar1",
              jack = TRUE, j1s = TRUE, fij = TRUE))

Run the code above in your browser using DataLab