mmm: an R function to fit the multivariate marginal models to analyze multivariate longitudinal data

Description

fits multivariate marginal models to analyze multivariate longitudinal data, for both continuous and discrete responses

Usage

mmm(formula, id, data = NULL, correlation = NULL, initEstim = NULL, tol = 0.001, 
maxiter = 25, family = "gaussian", corStruct = "independence", Mv = 1, 
silent = TRUE, scale.fix = FALSE, scale.value = 1)

Arguments

formula

a formula expression, see the examples given below.

a vector for identification of the clusters.

data

an optional data frame.

correlation

a user specified square matrix for the working correlation matrix, appropriate when corstr="fixed".

initEstim

user specified initials for the parameter estimates.

tol

the tolerance which specifies the convergency of the algorithm.

maxiter

the maximum number of iterations to be consumed by the algorithm.

family

an object which defines the link and variance function. The possible choices are same with the ones in the "gee" package. For details see the gee documentation. Note that family=binomial handles multivariate longitudinal binary data, family=poisson handles multivariate longitudinal count data, family=gaussian handles multivariate longitudinal (normal type) continous data and family=gamma handles multivariate longitudinal (gamma type) continous data.

corStruct

a character string which defines the structure of the working correlation matrix. For details see the gee documentation.

specifies the lag value, e.g. specification of "corstr=AR-M" and "Mv=1" indicates AR(1).

silent

a logial variable which decides the print of the iterations.

scale.fix

a logical variable for fixing the scale parameter to a user specified value.

scale.value

a user specified scale parameter value, appropriate when scale.fix=TRUE.

Value

Returns an onject of the results. See the examples given below.

References

Asar, O. (2012). On multivariate longitudinal binary data models and their applications in forecasting. MS Thesis, Middle East Technical University. Available online at http://www.lancaster.ac.uk/pg/asar/thesis-Ozgur-Asar

Asar, O., Ilk, O. (2013). mmm: an R package for analyzing multivariate longitudinal data with multivariate marginal models, Computer Methods and Programs in Biomedicine, 112, 649-654.

Liang, K. L., Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22.

Shelton, B. J., Gilbert, G. H., Liu, B., Fisher, M. (2004). A SAS macro for the analysis of multivariate longitudinal binary outcomes. Computer Methods and Programs in Biomedicine, 76, 163-175.

Zeger, S. L., Liang, K. L (1986). Longitudinal data analysis for discrete and continous outcomes. Biometrics, 42, 121-130.

Examples

Run this code

#########################
## Binary data example ##
#########################

data(motherStress)
fit1<-mmm(formula=cbind(stress,illness)~married+education+
employed+chlth+mhlth+race+csex+housize+bstress+billness+
week,id=motherStress$id,data=motherStress,family=binomial,corStruct="exchangeable")
summary(fit1)

########################
## Count data example ##
########################

## First we illustrate how the data set is simulated
## Then the R script to analyze the data set by mmm is given
## Note: no need to run the script to generate the data set, unless of interest

## Not run: 
# ### Generating the data by the help of 'corcounts' package
# 
# # loading the package 'corcounts'
# library("corcounts")
# 
# # setting the seed to 12
# set.seed(12)
# 
# # number of subjects in the study
# n1 <- 500
# 
# # defining the response and covariate families (Poi indicates Poisson distribution)
# margins <- c("Poi","Poi","Poi","Poi","Poi","Poi","Poi","Poi","Poi")
# 
# # the means of the responses and covariate. while 5 and 8 are the means of the responses
# # 20 is the mean of the time independent covariate
# mu <- c(5, 8, 20, 5, 8, 5, 8, 5, 8)
# 
# # the correlation structure which 'corcounts' uses to generate correlated count data
# # (unstr indicates unstructured correlation structure)
# corstr <- "unstr"
# 
# # the correlation matrix corcounts assumes the correlated count data have
# corpar<-matrix(c(1,0.4,0.6,0.9,0.37,0.8,0.34,0.7,0.31,
# 0.4,1,0.6,0.37,0.9,0.34,0.8,0.31,0.7,
# 0.6,0.6,1,0.6,0.6,0.6,0.6,0.6,0.6,
# 0.9,0.37,0.6,1,0.4,0.9,0.37,0.8,0.34,
# 0.37,0.9,0.6,0.4,1,0.37,0.9,0.34,0.8,
# 0.8,0.34,0.6,0.9,0.37,1,0.4,0.9,0.37,
# 0.34,0.8,0.6,0.37,0.9,0.4,1,0.37,0.9,
# 0.7,0.31,0.6,0.8,0.34,0.9,0.37,1,0.4,
# 0.31,0.7,0.6,0.34,0.8,0.37,0.9,0.4,1),ncol=9,byrow=T)
# 
# # generating the correlated count data by 'rcounts' function avaiable in 'corcounts'
# data1 <- rcounts(N=n1,margins=margins,mu=mu,corstr=corstr,corpar=corpar)
# 
# ### The reconstruction of the generated correlated count data to 
# ### the longitudinal data (long) format
# 
# # seperating the bivariate responses measured at the first time 
# # point and the time independent covariate 
# time11<-data1[,1:3]
# 
# # seperating the bivariate responses measured at the second time
# # point and combining them with the time independent covariate
# time12<-cbind(data1[,4:5],data1[,3])
# 
# # seperating the bivariate responses measured at the third time
# # point and combining them with the time independent covariate
# time13<-cbind(data1[,6:7],data1[,3])
# 
# # seperating the bivariate responses measured at the fourth time
# # point and combining them with the time independent covariate
# time14<-cbind(data1[,8:9],data1[,3])
# 
# # combining the data for all the time points
# data12<-rbind(time11,time12,time13,time14)
# 
# # constructing the time variable
# time1<-matrix(rep(seq(1:4),each=n1))
# 
# # constructing the id variable 
# id1<-matrix(rep(seq(1:n1),4))
# 
# # combining the id of the subjects, the simulated data and the time variable
# data13<-cbind(id1,data12,time1)
# 
# # reconstructing the data subject by subject which 'mmm' expects it has 
# data14<-NULL
# for (i in 1:n1) data14<-rbind(data14,data13[data13[,1]==i,])
# 
# ### Data manipulations on the covariates
# 
# # taking natural logarithm of the time independent covariate
# data14[,4]<-log(data14[,4])
# 
# # standardizing time variable
# data14[,5]<-scale(data14[,5])
# 
# # adding the interaction of the time independent covariate 
# # and time as a new covariate
# multiLongCount<-as.data.frame(cbind(data14,data14[,4]*data14[,5]))
# names(multiLongCount)<-c("ID","resp1","resp2","X","time","X.time")
# ## End(Not run)

### R script to analyze the count data set
### It is already stored in mmm pacakge

data(multiLongCount)
fit2<-mmm(formula=cbind(resp1,resp2)~X+time+X.time,
id=multiLongCount$ID,data=multiLongCount,family=poisson,corStruct="unstructured")
summary(fit2)

#############################
## Continuous data example ##
#############################

## First we illustrate how the data set is simulated
## Then the R script to analyze the data set by mmm is given
## Note: no need to run the script to generate the data set, unless of interest

## Not run: 
# ### Generating the data by the help of mvtnorm package
# 
# # loading package 'mvtnorm' 
# library("mvtnorm")
# 
# # number of subjects in the study
# n2<-500
# 
# # setting the seed to 12
# set.seed(12)
# 
# # specifying the correlation matrix which 'mvtnorm' assumes the correlated data have
# cormat<-matrix(c(1,0.4,0.6,0.9,0.37,0.8,0.34,0.7,0.31,
# 0.4,1,0.6,0.37,0.9,0.34,0.8,0.31,0.7,
# 0.6,0.6,1,0.6,0.6,0.6,0.6,0.6,0.6,
# 0.9,0.37,0.6,1,0.4,0.9,0.37,0.8,0.34,
# 0.37,0.9,0.6,0.4,1,0.37,0.9,0.34,0.8,
# 0.8,0.34,0.6,0.9,0.37,1,0.4,0.9,0.37,
# 0.34,0.8,0.6,0.37,0.9,0.4,1,0.37,0.9,
# 0.7,0.31,0.6,0.8,0.34,0.9,0.37,1,0.4,
# 0.31,0.7,0.6,0.34,0.8,0.37,0.9,0.4,1),ncol=9,byrow=T)
# 
# # variances of the responses and time independent covariate
# # while 0.97 and 1.1 correspond to the variances of the bivariate responses
# # 4 corresponds to the variance of the time independent covariate
# variance<-c(0.97,1.1,4,0.97,1.1,0.97,1.1,0.97,1.1)
# 
# # constructing the (diaonal) standard deviation matrix 
# std<-diag(sqrt(variance),9)
# 
# # constructing the variance covariance matrix, sigma
# sigma<-std
# 
# # generating the correlated continuous data utilizing 'rmvnorm' function available
# # in 'mvtnorm'; method="svd" indicates use of 'singular value decomposition method
# data2<-rmvnorm(n2,mean = rep(0,nrow(sigma)),sigma=sigma,method="svd")
# 
# 
# ### The reconstruction of the generated correlated continuous data to the 
# ### longitudinal data (long) format
# 
# # seperating the bivariate responses measured at first time point 
# # and the time independent covariate
# time21<-data2[,1:3]
# 
# # seperating the bivariate responses measured at second time point
# # and combining them with the time independent covariate
# time22<-cbind(data2[,4:5],data2[,3])
# 
# # seperating the bivariate responses measured at third time point
# # and combining them with the time independent covariate
# time23<-cbind(data2[,6:7],data2[,3])
# 
# # seperating the bivariate responses measured at fourth time point
# # and combining them with the time independent covariate
# time24<-cbind(data2[,8:9],data2[,3])
# 
# # combining the data for all the time points
# data22<-rbind(time21,time22,time23,time24)
# 
# # constructing the time variable
# time2<-matrix(rep(seq(1:4),each=n2))
# 
# # constructing the id variable
# id2<-matrix(rep(seq(1:n2),4))
# 
# # combining the id of the subjects, the generated data and the time variable
# data23<-cbind(id2,data22,time2)
# 
# # reconstructing the data subject by subject which 'mmm' expects it has
# data24<-NULL
# for (i in 1:n2) data24<-rbind(data24,data23[data23[,1]==i,])
# 
# ### Data manipulations on the covariates
# 
# # standardizing the time variable
# data24[,5]<-scale(data24[,5])
# 
# # adding the interaction of the time independent covariate 
# # and time as a new covariate
# multiLongGaussian<-as.data.frame(cbind(data24,data24[,4]*data24[,5]))
# names(multiLongGaussian)<-c("ID","resp1","resp2","X","time","X.time")
# ## End(Not run)


### R script to analyze the continuous data set
### It is already stored in mmm pacakge

data(multiLongGaussian)
fit3<-mmm(formula=cbind(resp1,resp2)~X+time+X.time,
id=multiLongGaussian$ID,data=multiLongGaussian,family=gaussian,corStruct="unstructured")
summary(fit3)