mle.ar1.fun: Performs the maximum likelihood estimation for the negative binomial mixed-effect AR(1) model

Description

This function fits a negative binomial mixed-effect AR(1) model in the formulation described Zhao et al.Under this model, the count response measure Y_ij from a subject i at time point j (i=1,...,N,j=1,...,n_i) is from the negative binomial distribution: Y_ij | G_i =g_i ~ NB(r_ij,p_i),and the parametrization of r_ij, p_i and the distributional assumption on G_i are the same as the independence model (mle.fun).Given emph{ G_i=g_i}, Y_ij depends on Y_i(j-1) through the beta binomial thinning and is conditionally independent on Y_ij' given Y_i(j-1) for all j' < j-1.The beta binomial thinning operator depends on a parameter d which indicates the strength of the positive AR(1) association between repeated measures of a subject: the larger d, the stronger the positive correlations between the repeated measures of the same subject are.This interpretation depends on the result:cov(Y_ij,Y_ij'|G_i=g_i)=d^{j-j'} E(Y_ij')*(a*g_i^2+g_i).See Zhao et al., for more details.

Usage

mle.ar1.fun(formula, data, ID, Vcode,
            p.ini=NULL, IPRT = FALSE, model = "G", 
            i.tol = 1e-75, o.tol = 0.001)

Arguments

formula

An object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.The formula must contain an intercept term.

data

A data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. Each row must contain the data corresponding to the repeated measure j of subject and the rows (i,j)s mus

A vector of length sum_i^N n_i, containing the subject IDs. i.e., c(rep(ID_1,n_1),rep(ID_2,n_2),...,rep(ID_N,n_N))

Vcode

A vector of length the total number of repeated measures, containing the indices of time point. For example, there are three subjects and the first two subjects do not have missing visits and completed five visits while the last subject missed the third

p.ini

Initial values for the parameters (log(a), log(theta),logit(d) beta0, beta1, ...)

IPRT

A logical, passed to Iprt of function optim. If TRUE then print iterations.

model

The distribution of random effects G_i. If model="G" then the random effects are assumed to be from the gamma distribution. If model="N" then they are assumed to be form the log-normal distribution.

i.tol

A real number to determine the tolerance for integrate.

o.tol

A real number to determine the tolerance for optim.

Value

optThe values returned by optim.
nlkThe value of the negative log-likelihood corresponding to opt$par
VThe approximated asymptotic covariance matrix of the maximum likelihood estimators. V=solve(opt$hessian)
estThe (4 + # covariates) by 2 matrix. The first column contains the estimates of the fixed effects, (log(a), log(theta),logit(d), beta0,beta1,...)The second column contains the approximated standard deviations of the estimators, i.e., sqrt(diag(V))
modIf model="G" then mod="G". If model="N" then mod="N".
VcodeA vector containing the input indices of time point.
cor"ar1", indicating that the model assumes AR1 correlation structure of the count responses given the random effects.

Details

mle.ar1.fun calls optim to minimize the negative log-likelihood of the negative binomial model with respect to the model parameters: c(log(a), log(theta), logit(d), beta0, beta1, ...).The Nelder-Mead algorithm is employed.The log-likelihood is obtained by marginalizing out the random effects.The numerical integration is carried out using adaptive quadrature.

When missing visits are present, an approximation of the likelihood is used (see Web Appendix II in Zhao et al for details.)

References

Zhao, Y., Li, D.K.B., Petkau, J.A., Riddehough, A. & Traboulsee, A. Detection of unusual increases in MRI lesion counts in multiple sclerosis patients.

Examples

Run this code

## ==================================================================================
## generate a data based on the negative binomial mixed-effect AR(1) model.
## Under this model, the response counts follows the negative binomial:
## Y_ij | G_i = g_i ~ NB(r_ij,p_i) where r_ij = exp(X^T beta)/a , p_i =1/(a*g_i+1)
## with G_i ~ Gamma(scale=th,shape=1/th)
## 
## The adjacent repeated measures of the same subject are correlated 
## with correlation structure:
## cov(Y_ij,Y_ij'|G_i=g_i)=d^{j-j'} E(Y_ij')*(a*g_i^2+g_i)  

# log(a) = -0.5, log(th)=1.3, logit(delta) = -0.2
# b0 =  0.5, no covariates; 
loga = -0.5
logtheta= 1.3
logitd = -0.2
b0 = 0.5
# 80 subjects each with 5 scans
n = 80
sn = 5


DT2 =  rNBME.R(gdist = "G",
               n = n, ## 	the total number of subjectss	       
	       sn = sn,
	       th=exp(logtheta),
               u1 = rep(exp(b0),sn),
	       u2 = rep(exp(b0),sn),
	       a = exp(loga),
	       d = exp(logitd)/(1+exp(logitd))
	      )
Vcode=rep(-1:(sn-2),n) # scan number -1, 0, 1, 2, 3
ID = DT2$id
new = Vcode > 0
dt2=data.frame(CEL=DT2$y)

## ================================================================================

## 1) Fit the negative binomial mixed-effect AR(1) model 
## where the random effects are from the gamma distribution
## This is the true model

re.gamma.ar1=mle.ar1.fun(formula=CEL~1,data=dt2,ID=ID,
		      Vcode=Vcode, 
		      p.ini=c(loga,logtheta,logitd,b0), 
		      ## log(a), log(theta), logit(d), b0
		       model="G", 
		      IPRT=TRUE) 



## compute the estimates of the conditional probabilities 
## with sum of the new repeated measure as a summary statistics 
Psum=index.batch(olmeNB=re.gamma.ar1,data=dt2,ID=ID,Vcode=Vcode,
		 labelnp=new,qfun="sum", IPRT=TRUE,i.se=FALSE)  
		 

## compute the estimates of the conditional probabilities 
## with max of the new repeated measure as a summary statistics 
Pmax=index.batch(olmeNB=re.gamma.ar1,data=dt2,ID=ID,Vcode=Vcode, 
                 labelnp=new,qfun="max", IPRT=TRUE)

## Which patient's estimated probabilities based on the sum and max 
## statistics disagrees the most?
( IDBigDif <- which(rank(abs(Pmax$condProbSummary[,1]-Psum$condProbSummary[,1]))==80) )
## Show the patient's CEL counts  
dt2$CEL[ID==IDBigDif]
## Show the estimated conditional probabilities based on the sum summary statistics
Psum$condProbSummary[IDBigDif,]
## Show the estimated conditional probabilities based on the max summary statistics
Pmax$condProbSummary[IDBigDif,]


## 2) Fit the negative binomial mixed-effect AR(1) model 
## where random effects is from the log-normal distribution

re.logn.ar1=mle.ar1.fun(formula=CEL~1,data=dt2,ID=ID,
		      Vcode=Vcode, 
		      p.ini=c(loga,logtheta,logitd,b0), 
		      ## log(a), log(theta), logit(d), b0
		       model="N", 
		      IPRT=TRUE)

## Require some time
Psum=index.batch(olmeNB=re.logn.ar1,data=dt2,ID=ID,Vcode=Vcode, 
                 labelnp=new,qfun="sum", IPRT=TRUE,i.se=TRUE) 
re.logn.ar1$Psum=Psum




## 3) Fit the negative binomial independent model 
## where random effects are from the gamma distribution
re.gamma.ind=mle.fun(formula=CEL~1,data=dt2,ID=ID, 
                   model="G", 
		   p.ini=c(loga,logtheta,b0), 
		   IPRT=TRUE)

Psum=index.batch(olmeNB=re.gamma.ind,data=dt2,ID=ID, 
                 labelnp=new,qfun="sum", IPRT=TRUE,i.se=TRUE)  




## 4) Fit the negative binomial independent model 
## where random effects are from the lognormal distribution
re.logn.ind=mle.fun(formula=CEL~1,data=dt2,ID=ID, 
                   model="N", 			   	
		   p.ini=c(loga,logtheta,b0), 		
		   IPRT=TRUE)

Psum=index.batch(olmeNB=re.logn.ind, data=dt2,ID=ID, 
                 labelnp=new,qfun="sum", IPRT=TRUE,i.se=TRUE) 


## 5) Fit the semi-parametric negative binomial AR(1) model 

logvarG = -0.5

re.semi.ar1=mle.ar1.non3(formula=CEL~1,data=dt2,ID=ID, 
                         p.ini=c(loga, logvarG, logitd,b0),Vcode=Vcode)

Psum=index.batch(olmeNB=re.semi.ar1,data=dt2,ID=ID, Vcode=Vcode,
		 labelnp=new,qfun="sum", IPRT=TRUE,i.se=FALSE)  
## No option of i.se=TRUE



## 6) Fit the semi-parametric negative binomial independent model 
re.semi.ind=mle.a3.fun(formula=CEL~1,data=dt2,ID=ID, p.ini=c(loga, logvarG, b0))
Psum=index.batch(olmeNB=re.semi.ind,data=dt2,ID=ID,  
                 labelnp=new, qfun="sum", IPRT=TRUE,i.se=FALSE) 
## No option of i.se=TRUE



## ======================================================================== ##
## == Which model performed the best in terms of the estimation of beta0 == ##
## ======================================================================== ##

getpoints <- function(y,estb0,sdb0=NULL,crit=qnorm(0.975))
{	
points(estb0,y,col="blue",pch=16)
if (!is.null(sdb0))
{
points(c(estb0-crit*sdb0,estb0+crit*sdb0),rep(y,2),col="red",type="l")
}
}
ordermethod <- c("gamma.ar1","logn.ar1","gamma.ind","logn.ind","semi.ar1","semi.ind")

estb0s <- c(
re.gamma.ar1$est[4,1],
re.logn.ar1$est[4,1],
re.gamma.ind$est[3,1],
re.logn.ind$est[3,1],
re.semi.ar1$est[4],
re.semi.ind$est[3]
)

## The true beta0 is:
b0
c <- 1.1
plot(0,0,type="n",xlim=c(min(estb0s)-0.5,max(estb0s)*c),ylim=c(0,7),yaxt="n",
main="Simulated from the AR(1) model \n with random effect ~ gamma")

legend("topright",
	legend=ordermethod)
abline(v=b0,lty=3)

## 1) gamma.ar1
sdb0 <- re.gamma.ar1$est[4,2]
getpoints(6,estb0s[1],sdb0)

## 2)logn.ar1
sdb0 <- re.logn.ar1$est[4,2] 
getpoints(5,estb0s[2],sdb0)

## 3) gamma.ind
sdb0 <- re.gamma.ind$est[3,2]
getpoints(4,estb0s[3],sdb0)

## 4) logn.ind
sdb0 <- re.logn.ind$est[3,2]
getpoints(3,estb0s[4],sdb0)

## 5) semi.ar1
getpoints(2,estb0s[5])

## 6) semi.ind
getpoints(1,estb0s[6])

Run the code above in your browser using DataLab