geem2: Fit Generalized Estimating Equation Models

Description

geem2 is a modified version of geem to fit generalized estimating equation models and to provide objects that can be used for simultaneous inference across multiple marginal models using mmmgee and mmmgee.test. Like geem, geem2 estimates coefficients and nuisance parameters using generalized estimating equations. The link and variance functions can be specified by the user and the syntax is similar to glm.

Usage

geem2(formula, id, waves = NULL, data = parent.frame(),
  family = gaussian, corstr = "independence", Mv = 1,
  weights = NULL, corr.mat = NULL, init.beta = NULL,
  init.alpha = NULL, init.phi = 1, scale.fix = FALSE,
  nodummy = FALSE, sandwich = TRUE, useP = TRUE, maxit = 20,
  tol = 1e-05, restriction = NULL, conv.criterion = c("ratio",
  "difference"))

Arguments

formula

a formula expression similar to that for glm, of the form response~predictors. An offset is allowed, as in glm.

a vector identifying the clusters. By default, data are assumed to be sorted such that observations in a cluster are in consecutive rows and higher numbered rows in a cluster are assumed to be later. If NULL, then each observation is assigned its own cluster.

waves

a non-negative integer vector identifying components of a cluster. For example, this could be a time ordering. If integers are skipped within a cluster, then dummy rows with weight 0 are added in an attempt to preserve the correlation structure (except if corstr = "exchangeable" or "independent"). This can be skipped by setting nodummy=TRUE. When assessing missing values, waves are assumed to start at 1, starting at a larger integer is therefore computationally inefficient.

data

an optional data frame containing the variables in the model.

family

will determine the link and variance functions. The argument can be one of three options: a family object, a character string, or a list of functions. For more information on how to use family objects, see family. If the supplied argument is a character string, then the string should correspond to one of the family objects. In order to define a link function, a list must be created with the components (LinkFun, VarFun, InvLink, InvLinkDeriv), all of which are vectorized functions. If the components in the list are not named as (LinkFun, VarFun, InvLink, InvLinkDeriv), then geem2 assumes that the functions are given in that order. LinkFun and VarFun are the link and variance functions. InvLink and InvLinkDeriv are the inverse of the link function and the derivative of the inverse of the link function and so are decided by the choice of the link function.

corstr

a character string specifying the correlation structure. Allowed structures are: "independence", "exchangeable", "ar1", "m-dependent", "unstructured", "fixed", and "userdefined". Any unique substring may be supplied. If "fixed" or "userdefined", then corr.mat must be specified. If "m-dependent", then Mv is relevant.

for "m-dependent", the value for m.

weights

A vector of weights for the inverse of the scale factor each observation. If an observation is assigned weight 0, it is excluded from the calculations of any parameters. Observations with a NA in any variable will be assigned a weight of 0. Note that weights are defined differently in geem2 and geem, see details.

corr.mat

the correlation matrix for "fixed". Matrix should be symmetric with dimensions >= the maximum cluster size. If the correlation structure is "userdefined", then this is a matrix describing which correlations are the same. In that case, all entries have to be integers, and values less or equal zero indicate a correlation of zero. The information regarding the user-defined structure are extracted from the upper triangle of the provided matrix.

init.beta

an optional vector with the initial values of beta. If not specified, then the intercept will be set to InvLink(mean(response)). init.beta must be specified if not using an intercept.

init.alpha

an optional scalar or vector giving the initial values for the correlation. If provided along with Mv>1 or unstructured correlation, then the user must ensure that the vector is of the appropriate length.

init.phi

an optional initial scale parameter. If not supplied, initialized to 1.

scale.fix

if set to TRUE, then the scale parameter is fixed at the value of init.phi. See details.

nodummy

if set to TRUE, then dummy rows will not be added based on the values in waves.

sandwich

if TRUE, calculate robust variance.

useP

if set to FALSE, do not use the n-p correction for dispersion and correlation estimates, as in Liang and Zeger. This can be useful when the number of observations is small, as subtracting p may yield correlations greater than 1.

maxit

maximum number of iterations.

tol

tolerance in calculation of coefficients.

restriction

either a contrast matrix or a list of a contrast matrix and a right hand side vector, defining a restriction on the regression coefficients. See details.

conv.criterion

convergence criterion, either "ratio" or "difference". The default is "ratio", using the relative change in regression coefficient estimates as convergence criterion, like in geem. With "difference" the maximum absolute difference in regression coefficient estimates is used. The latter is required if some coefficient is 0, e.g. by estimation under some restriction.

Value

A list with class geem2, similar to the output of geem from the geeM package. The additional slot sandwich.args contains components to calculate the sandwich variance estimator for the fitted model and across models if applied in the multiple marginal model framework.

Details

The function is a modification of geem from the geeM package, such that additional output is returned that is required for the calculation of covariance matrix across multiple marginal models. In particular the contributions of each subject to the estimating equation are made available in the output. Internal functions regarding the calculation of matrix inverses were modified to improve the handling of missing data.

In geem2, weights are defined as scale weights, similar to most othe software. Note that, in contrast, the current version of geem (version 0.10.1) uses residual weights.

The scale parameter phi is used in estimating the residual working correlation parameters and in estimating the model based (naiv) covariance matrix of the regression coefficients. Similar as in most other software, requesting scale.fix=TRUE only has an impact on the latter, while the working correlation is still estimated using an empirical scale factor for the residuals. In contrast, geem uses the fixed scale factor also when estimating the working correlation.

geem2 allows for estimation of regression coefficients under linear restrictions \(L\beta=r\), where \(L\) is a contrast matrix, \(\beta\) the vector of regression coefficients and \(r\) a real valued right hand side vector. Using the argument restriction, \(L\) and \(r\) can be specified. If only \(L\) is specified, \(r\) is assumed as null vector. The functionality is in particular required to calculate the generalized score test for linear hypotheses about \(\beta\). Use conv.criterion="difference" if any regression coefficient is restricted to 0.

References

Lee S. McDaniel, Nicholas C. Henderson, Paul J. Rathouz. Fast pure R implementation of GEE: application of the matrix package. The R journal 5.1 (2013): 181.

Examples

Run this code

# NOT RUN {
data(keratosis)
m1<-geem2(clearance~trt,id=id,data=keratosis,family=binomial,corstr="independence")
summary(m1)
m2<-geem2(pain~trt,id=id,data=keratosis[keratosis$lesion==1,],family=gaussian,corstr="independence")
summary(m2)
geem2(pain~trt,id=id,data=keratosis[keratosis$lesion==1,],family=gaussian,corstr="exchangeable")
#
data(datasim)
mod1<-geem2(Y.lin~gr.lang+x1,id=id,data=datasim,family="gaussian",corstr="exchangeable")
summary(mod1)
mod2<-geem2(Y.poi~gr.lang+x2,id=id,data=datasim,family="poisson",corstr="unstructured")
summary(mod2)
mod3<-geem2(Y.bin~gr.lang+x3,id=id,data=datasim,family="binomial",corstr="user",
	corr.mat=matrix(c(1,2,3,0, 2,1,2,3, 3,2,1,2, 0,3,2,1),4,4))
summary(mod3)

# }

Run the code above in your browser using DataLab