clic: CL1 INFORMATION CRITERIA

Description

Composite likelihood (CL1) information criteria.

Usage

clic1dePar(nbcl,r,b,gam,xdat,id,tvec,corstr,WtScMat,link,mvncmp)
clic(nbcl,r,b,gam,xdat,id,tvec,corstr,WtScMat,link,mvncmp)

Arguments

nbcl

The negative value of the sum of bivariate marginal log-likelihoods at CL1 estimates.

The CL1 estimates of the latent correlations.

The CL1 estimates of the regression coefficients.

gam

The CL1 estimates of the cutpoints.

xdat

$(\mathbf{x}_1 , \mathbf{x}_2 , \ldots , \mathbf{x}_n )^\top$, where the matrix $\mathbf{x}_i,\,i=1,\ldots,n$ for a given unit will depend on the times of observation for that unit ($j_i$) and will have number of rows $j_i$, each row corresponding to one of the $j_i$ elements of $y_i$ and $p$ columns where $p$ is the number of covariates. This xdat matrix is of dimension $(N\times p),$ where $N =\sum_{i=1}^n j_i$ is the total number of observations from all units.

An index for individuals or clusters.

tvec

A vector with the time indicator of individuals or clusters.

WtScMat

A list containing the following components. omega: The array with the $\Omega_i,\,i=1,\ldots,n$ matrices; delta: The array with the $\Delta_i,\,i=1,\ldots,n$ matrices; X: The array with the $X_i,\,i=1,\ldots,n$ matrices.

corstr

Indicates the latent correlation structure of normal copula. Choices are “exch”, “ar”, and “unstr” for exchangeable, ar(1) and unstructured correlation structure, respectively.

link

The link function. Choices are “logit” for the logit link function, and “probit” for the probit link function.

mvncmp

The method of computation of the MVN rectangle probabilities. Choices are 1 for mvnapp (faster), and 2 for pmvnorm (more accurate).

Value

A list containing the following components:

AIC

The CL1AIC.

BIC

The CL1BIC.

Details

First, consider the sum of univariate log-likelihoods $$L_1= \sum_{i=1}^n\sum_{j=1}^d\, \log f_1(y_{ij};\nu_{ij},\boldsymbol{\gamma})=\sum_{i=1}^n\sum_{j=1}^d\, \ell_1(\nu_{ij},\boldsymbol{\gamma},\, y_{ij}), $$ and then the sum of bivariate log-likelihoods $$L_2=\sum_{i=1}^{n}\sum_{j<k} \log{f_2(y_{ij},y_{ik};\nu_{ij},\nu_{ik},\boldsymbol{\gamma},\rho_{jk})}=\sum_{i=1}^{n}\sum_{j<k}\ell_2(\nu_{ij},\nu_{ik},\boldsymbol{\gamma},\rho_{jk};y_{ij},y_{ik}),$$ where $\ell_2(\cdot)=\log f_2(\cdot)$ and $$ f_2(y_{ij},y_{ik};\nu_{ij},\nu_{ik},\boldsymbol{\gamma},\rho_{jk})=\int_{\Phi^{-1}[F_1(y_{ij}-1;\nu_{ij},\boldsymbol{\gamma})]}^{\Phi^{-1}[F_1(y_{ij};\nu_{ij},\boldsymbol{\gamma})]} \int_{\Phi^{-1}[F_1(y_{ik}-1;\nu_{ik}),\boldsymbol{\gamma}]}^{\Phi^{-1}[F_1(y_{ik};\nu_{ik},\boldsymbol{\gamma})]} \phi_2(z_j,z_d;\rho_{jk}) dz_j dz_k; $$ $\phi_2(\cdot;\rho)$ denotes the standard bivariate normal density with correlation $\rho$.

Let $\mathbf{a}^\top=(\boldsymbol{\beta}^\top,\boldsymbol{\gamma}^\top)$ be the column vector of all $r=p+q$ univariate parameters. Differentiating $L_1$ with respect to $\mathbf{a}$ leads to the independent estimating equations or univariate composite score functions: $$ \mathbf{g}_1=\mathbf{g}_1(\mathbf{a})= \frac{\partial L_1}{\partial \mathbf{a}}= \sum_{i=1}^n\mathbf{X}_i^\top\mathbf{s}_i^{(1)}(\mathbf{a})=\bf 0, $$

Differentiating $L_2$ with respect to $\mathbf{R}=\bigl(\rho_{jk},1\leq j<k\leq d\bigr)$ leads to the bivariate composite score functions (Zhao and Joe, 2005):

$$ \mathbf{g}_2=\frac{\partial L_2}{\partial \mathbf{R}}= \sum_{i=1}^{n}\mathbf{s}_i^{(2)}(\mathbf{a},\mathbf{R})= \sum_{i=1}^n\Bigl(\mathbf{s}_{i,jk}^{(2)}(\mathbf{a},\rho_{jk}),1\leq j<k\leq d\Bigr)=\mathbf{0}, $$ where $\mathbf{s}_i^{(2)}(\mathbf{a},\mathbf{R})=\frac{\partial\sum_{j<k}\ell_2(\nu_{ij},\nu_{ik},\boldsymbol{\gamma},\rho_{jk};y_{ij},y_{ik})}{\partial \mathbf{R}}$ and $\mathbf{s}_{i,jk}^{(2)}(\boldsymbol{\gamma},\rho_{jk})=\frac{\partial \ell_2(\nu_{ij},\nu_{ik},\boldsymbol{\gamma},\rho_{jk};y_{ij},y_{ik})}{\partial \rho_{jk}}$. The CL1 estimates $\widetilde{\mathbf{a}}$ and $\widetilde{\mathbf{R}}$ of the discretized MVN model are obtained by solving the above CL1 estimating functions.

The asymptotic covariance matrix for the estimator that solves them, also known as the inverse Godambe (Godambe, 1991) information matrix, is $$ \mathbf{V}=(-\mathbf{H}_\mathbf{g})^{-1}\mathbf{J}_\mathbf{g}(-\mathbf{H}_\mathbf{g}^\top)^{-1}, $$ where $\mathbf{g}=(\mathbf{g}_1,\mathbf{g}_2)^\top$. First set $\boldsymbol{\theta}=(\mathbf{a},\mathbf{R})^\top$, then $$ -\mathbf{H}_\mathbf{g}=E\Bigl(\frac{\partial \mathbf{g}}{\partial\boldsymbol{\theta}}\Bigr)= \left[\begin{array}{cc} E\Bigl(\frac{\partial \mathbf{g}_1}{\partial\mathbf{a}}\Bigr) & E\Bigl(\frac{\partial \mathbf{g}_1}{\partial\mathbf{R}}\Bigr)\\ E\Bigl(\frac{\partial \mathbf{g}_2}{\partial\mathbf{a}}\Bigr) & E\Bigl(\frac{\partial \mathbf{g}_2}{\partial\mathbf{R}}\Bigr) \end{array}\right]= \left[\begin{array}{cc} -\mathbf{H}_{\mathbf{g}_1}&\mathbf{0}\\ -\mathbf{H}_{\mathbf{g}_{2,1}}&-\mathbf{H}_{\mathbf{g}_2} \end{array}\right], $$ where $-\mathbf{H}_{\mathbf{g}_1}=\sum_i^n\mathbf{X}_i^\top\boldsymbol{\Delta}_i^{(1)} \mathbf{X}_i$, $-\mathbf{H}_{\mathbf{g}_{2,1}}=\sum_i^n\boldsymbol{\Delta}_i^{(2,1)}\mathbf{X}_i$, and $-\mathbf{H}_{\mathbf{g}_2}=\sum_i^n\boldsymbol{\Delta}_i^{(2,2)}$.

The covariance matrix $\mathbf{J}_\mathbf{g}$ of the composite score functions $\mathbf{g}$ is given as below $$ \mathbf{J}_\mathbf{g}=\mbox{Cov}(\mathbf{g})= \left[\begin{array}{cc} \mbox{Cov}(\mathbf{g}_1) & \mbox{Cov}(\mathbf{g}_1,\mathbf{g}_2)\\ \mbox{Cov}(\mathbf{g}_2,\mathbf{g}_1) & \mbox{Cov}(\mathbf{g}_2) \end{array}\right]= \left[\begin{array}{cc}\mathbf{J}_\mathbf{g}^{(1)}& \mathbf{J}_\mathbf{g}^{(1,2)}\\ \mathbf{J}_\mathbf{g}^{(2,1)}& \mathbf{J}_\mathbf{g}^{(2)}\end{array}\right]= \sum_i\left[\begin{array}{cc} \mathbf{X}_i^\top\boldsymbol{\Omega}_i^{(1)} \mathbf{X}_i & \mathbf{X}_i^\top\boldsymbol{\Omega}_i^{(1,2)}\\ \boldsymbol{\Omega}_i^{(2,1)}\mathbf{X}_i& \boldsymbol{\Omega}_i^{(2)}\end{array}\right], $$ where $$\left[\begin{array}{cc} \boldsymbol{\Omega}_i^{(1)}& \boldsymbol{\Omega}_i^{(1,2)}\\ \boldsymbol{\Omega}_i^{(2,1)}& \boldsymbol{\Omega}_i^{(2)} \end{array}\right]= \left[\begin{array}{cc} \mbox{Cov}\Bigl(\mathbf{s}_i^{(1)}(\mathbf{a})\Bigr) & \mbox{Cov}\Bigl(\mathbf{s}_i^{(1)}(\mathbf{a}), \mathbf{s}_i^{(2)}(\mathbf{a},\mathbf{R})\Bigr)\\ \mbox{Cov}\Bigl(\mathbf{s}_i^{(2)}(\mathbf{a},\mathbf{R}), \mathbf{s}_i^{(1)}(\mathbf{a})\Bigr) & \mbox{Cov}\Bigl(\mathbf{s}_i^{(2)}(\mathbf{a},\mathbf{R})\Bigr) \end{array}\right]. $$ To this end, the composite AIC (Varin and Vidoni, 2005) and BIC (Gao and Song, 2011) criteria have the forms:

$$ \mbox{CL1AIC} = -2L_2 + 2\mbox{tr}\Bigl(\mathbf{J}_\mathbf{g}\mathbf{H}_\mathbf{g}^{-1}\Bigr), $$ $$ \mbox{CL1BIC} =-2L_2+\log(n)\mbox{tr}\Bigl(\mathbf{J}_\mathbf{g}\mathbf{H}_\mathbf{g}^{-1}\Bigr). $$

References

Gao, X. and Song, P.X.K. (2011). Composite likelihood EM algorithm with applications to multivariate hidden Markov model. Statistica Sinica 21, 165--185.

Godambe, V. P. (1991) Estimating Functions. Oxford: Oxford University Press

Nikoloulopoulos, A.K., Joe, H. and Chaganty, N.R. (2011) Weighted scores method for regression models with dependent data. Biostatistics, 12, 653--665. 10.1093/biostatistics/kxr005.

Nikoloulopoulos, A.K. (2016) Correlation structure and variable selection in generalized estimating equations via composite likelihood information criteria. Statistics in Medicine, 35, 2377--2390. 10.1002/sim.6871.

Nikoloulopoulos, A.K. (2017) Weighted scores method for longitudinal ordinal data. Arxiv e-prints, <arXiv:1510.07376>. https://arxiv.org/abs/1510.07376.

Varin, C. and Vidoni, P. (2005). A note on composite likelihood inference and model selection. Biometrika 92, 519--528.

Zhao, Y. and Joe, H. (2005) Composite likelihood estimation in multivariate data analysis. The Canadian Journal of Statistics, 33, 335--356.

Examples

Run this code

# NOT RUN {
  ################################################################################
  #                           Binary regression 
  ################################################################################
  ################################################################################
  #                      read and set up the data set
  ################################################################################
  data(childvisit)
  # covariates
  season1<-childvisit$q
  season1[season1>1]<-0
  xdat<-cbind(childvisit$sex,childvisit$age,childvisit$m,season1)
  # response
  ydat<-childvisit$hosp
  ydat[ydat>0]=1
  ydat=2-ydat
  #id
  id<-childvisit$id
  #time
  tvec<-childvisit$q
  ################################################################################
  #                      select the link
  ################################################################################
  link="logit"
  ################################################################################
  #                      select the  correlation structure
  ################################################################################
  corstr="exch"
  ################################################################################
  #                      perform CL1 estimation
  ################################################################################
  i.est<-iee.ord(xdat,ydat,link)
  cat("\niest: IEE estimates\n")
  print(c(i.est$reg,i.est$gam))
  est.rho<-cl1.ord(b=i.est$reg,gam=i.est$gam,xdat,ydat,id,tvec,corstr,link)
  cat("\nest.rho: CL1 estimates\n")
  print(est.rho$e)
  cat("\nest.rho: negative CL1 log-likelhood\n")
  print(est.rho$m)
  ################################################################################
  #                      obtain the fixed weight matrices
  ################################################################################
  WtScMat<-weightMat.ord(b=i.est$reg,gam=i.est$gam,rh=0.1961,xdat,ydat,id,
                         tvec,corstr,link)
  ################################################################################
  #                      obtain the CL1 information criteria
  ################################################################################
  out<-clic1dePar(nbcl=est.rho$m,r=est.rho$e,i.est$r,i.est$g,xdat,id,tvec,corstr,WtScMat,link,1)
# }

Run the code above in your browser using DataLab