longiPenal: Fit a Joint Model for Longitudinal Data and a Terminal Event

Description

Fit a joint model for longitudinal data and a terminal event using a semiparametric penalized likelihood estimation or a parametric estimation on the hazard function.

The longitudinal outcomes $y_i(t_{ik})$ ($k=1,\ldots,n_i$, $i=1,\ldots,N$) for $N$ subjects are described by a linear mixed model and the risk of the terminal event is represented by a proportional hazard risk model. The joint model is constructed assuming that the processes are linked via a latent structure (Wulfsohn and Tsiatsis 1997):

$$\left{ \begin{array}{ll} y_{i}(t_{ik})=\bold{X}_{Li}(t_{ik})^\top \bold{\beta}_L +\bold{ Z}_i(t_{ik})^\top \bold{b}_i + \epsilon_i(t_{ik}) & \mbox{(Longitudinal)} \ \lambda_i(t|\bold{b}_i)=\lambda_0(t)\exp(\bold{X}_{Ti}(t)\bold{\beta}_T+h(\bold{b}_i,\bold{\beta}_L,\bold{Z}_i(t),\bold{X}_{Li}(t))^\top \bold{\eta}_T ) & \mbox{(Terminal)} \ \end{array} \right.$$

where $\bold{X}_{Li}(t)$ and $\bold{X}_{Ti}$ are vectors of fixed effects covariates and $\bold{\beta}_L$ and $\bold{\beta}_T$ are the associated coefficients. Measurements errors $\epsilon_i(t_{ik})$ are iid normally distributed with mean 0 and variance $\sigma_{\epsilon}^2$. The random effects $\bold{b}_i = (b_{0i},\ldots, b_{qi})^\top\sim \mathcal{N}(0,\bold{B}_1)$ are associated to covariates $\bold{Z}_i(t)$ and independent from the measurement error. The relationship between the two processes is explained via $h(\bold{b}_i,\bold{\beta}_L,\bold{Z}_i(t),\bold{X}_{Li}(t))$ with coefficients $\bold{\eta}_T$. Two forms of the function $h(\cdot)$ are available: the random effects $\bold{b}_i$ and the current biomarker level $m_i(t)=\bold{X}_{Li}(t_{ik})^\top \bold{\beta}_L +\bold{ Z}_i(t_{ik})^\top \bold{b}_i$.

We consider that the longitudinal outcome can be a subject to a quantification limit, i.e. some observations, below a level of detection $s$ cannot be quantified (left-censoring).

Usage

longiPenal(formula, formula.LongitudinalData, data,  data.Longi,
      random, id, intercept = TRUE, link = "Random-effects",
      left.censoring = FALSE, n.knots, kappa, maxit = 350, 
      hazard = "Splines", init.B, init.Random, init.Eta, 
      method.GH = "Standard", n.nodes, LIMparam = 1e-3, 
      LIMlogl = 1e-3, LIMderiv = 1e-3, print.times = TRUE)

Arguments

formula

a formula object, with the response on the left of a $\texttildelow$ operator, and the terms on the right. The response must be a survival object as returned by the 'Surv' function like in survival package. Interactions are possible u

formula.LongitudinalData

a formula object, only requires terms on the right to indicate which variables are modelling the longitudinal outcome. It must follow the standard form used for linear mixed-effects models. Interactions are possible using * or :.

data

a 'data.frame' with the variables used in formula.

data.Longi

a 'data.frame' with the variables used in formula.LongitudinalData.

random

Names of variables for the random effects of the longitudinal outcome. Maximum 2 random effects are possible at the moment. The random intercept is chosen using "1".

Name of the variable representing the individuals.

intercept

Logical value. Is the fixed intercept of the biomarker included in the mixed-effects model? The default is TRUE.

link

Type of link function for the dependence between the biomarker and death: "Random-effects" for the association directly via the random effects of the biomarker, "Current-level" for the association via the true current level of th

left.censoring

Is the biomarker left-censored below a threshold $s$? The default is FALSE, ie. no left-censoring. In case of a left-censored biomarker, this argument must be equal to the threshold $s$.

n.knots

Integer giving the number of knots to use. Value required in the penalized likelihood estimation. It corresponds to the (n.knots+2) splines functions for the approximation of the hazard or the survival functions. We estimat

kappa

Positive smoothing parameter in the penalized likelihood estimation. The coefficient kappa of the integral of the squared second derivative of hazard function in the fit (penalized log likelihood). To obtain an initial value for

maxit

Maximum number of iterations for the Marquardt algorithm. The default is 350.

hazard

Type of hazard functions: "Splines" for semiparametric hazard functions using equidistant intervals or "Splines-per" using percentile with the penalized likelihood estimation, "Weibull" for the parametric

init.B

Vector of initial values for regression coefficients. This vector should be of the same size as the whole vector of covariates with the first elements for the covariates related to the terminal event and then for the covariates related to the biomarker (i

init.Random

Initial value for variance of the elements of the matrix of the distribution of the random effects. Default is 0.5 for each element.

init.Eta

Initial values for regression coefficients for the link function. Default is 0.5 for each.

method.GH

Method for the Gauss-Hermite quadrature: "Standard" for the standard non-adaptive Gaussian quadrature, "Pseudo-adaptive" for the pseudo-adaptive Gaussian quadrature and "HRMSYM" for the algorithm for the multivariate

n.nodes

Number of nodes for the Gauss-Hermite quadrature. They can be chosen among 5, 7, 9, 12, 15, 20 and 32. The default is 9.

LIMparam

Convergence threshold of the Marquardt algorithm for the parameters (see Details of frailtyPenal function), $10^{-3}$ by default.

LIMlogl

Convergence threshold of the Marquardt algorithm for the log-likelihood (see Details of frailtyPenal function), $10^{-3}$ by default.

LIMderiv

Convergence threshold of the Marquardt algorithm for the gradient (see Details of frailtyPenal function), $10^{-3}$ by default.

print.times

a logical parameter to print iteration process. The default is TRUE.

Value

The following components are included in a 'longiPenal' object for each model:
bThe sequence of the corresponding estimation of the coefficients for the hazard functions (parametric or semiparametric), the random effects variances and the regression coefficients.
callThe code used for the model.
formulaThe formula part of the code used for the terminal event part of the model.
formula.LongitudinalDataThe formula part of the code used for the longitudinal part of the model.
coefThe regression coefficients (first for the terminal event and then for the biomarker.
groupsThe number of groups used in the fit.
kappaThe value of the smoothing parameter in the penalized likelihood estimation corresponding to the baseline hazard function for the terminal event.
logLikPenalThe complete marginal penalized log-likelihood in the semiparametric case.
logLikThe marginal log-likelihood in the parametric case.
n.measurementsThe number of biomarker observations used in the fit.
max_repThe maximal number of repeated measurements per individual.
n.deathsThe number of events observed in the fit.
n.iterThe number of iterations needed to converge.
n.knotsThe number of knots for estimating the baseline hazard function in the penalized likelihood estimation.
n.stratThe number of stratum.
varHThe variance matrix of all parameters (before positivity constraint transformation for the variance of the measurement error, for which the delta method is used).
varHIHThe robust estimation of the variance matrix of all parameters.
xDThe vector of times where both survival and hazard function of the terminal event are estimated. By default seq(0,max(time),length=99), where time is the vector of survival times.
lamDThe array (dim=3) of baseline hazard estimates and confidence bands (terminal event).
survDThe array (dim=3) of baseline survival estimates and confidence bands (terminal event).
typeofThe type of the baseline hazard functions (0:"Splines", "2:Weibull").
nparThe number of parameters.
nvarThe vector of number of explanatory variables for the terminal event and biomarker.
nvarEndThe number of explanatory variables for the terminal event.
nvarYThe number of explanatory variables for the biomarker.
noVarEndThe indicator of absence of the explanatory variables for the terminal event.
noVarYThe indicator of absence of the explanatory variables for the biomarker.
LCVThe approximated likelihood cross-validation criterion in the semiparametric case (with H minus the converged Hessian matrix, and l(.) the full log-likelihood).$$LCV=\frac{1}{n}(trace(H^{-1}_{pl}H) - l(.))$$
AICThe Akaike information Criterion for the parametric case.$$AIC=\frac{1}{n}(np - l(.))$$
n.knots.tempThe initial value for the number of knots.
shape.weibThe shape parameter for the Weibull hazard function.
scale.weibThe scale parameter for the Weibull hazard function.
martingaledeath.resThe martingale residuals for each individual.
conditional.resThe conditional residuals for the biomarker (subject-specific): $\bold{R}_i^{(m)}=\bold{y}_i-\bold{X}_{Li}^\top\widehat{\bold{\beta}}_L-\bold{Z}_i^\top\widehat{\bold{b}}_i$.
marginal.resThe marginal residuals for the biomarker (population averaged): $\bold{R}_i^{(c)}=\bold{y}_i-\bold{X}_{Li}^\top\widehat{\bold{\beta}}_L$.
marginal_chol.resThe Cholesky marginal residuals for the biomarker: $\bold{R}_i^{(m)}=\widehat{\bold{U}_i^{(m)}}\bold{R}_i^{(m)}$, where $\widehat{\bold{U}_i^{(m)}}$ is an upper-triangular matrix obtained by the Cholesky decomposition of the variance matrix $\bold{V}_{\bold{R}_i^{(m)}}=\widehat{\bold{V}_i}-\bold{X}_{Li}(\sum_{i=1}^N\bold{X}_{Li}\widehat{\bold{V}_i}^{-1}\bold{X}_{Li})^{-1}\bold{X}_{Li}^\top$.
conditional_st.resThe standardized conditional residuals for the biomarker.
marginal_st.resThe standardized marginal residuals for the biomarker.
random.effects.predThe empirical Bayes predictions of the random effects (ie. using conditional posterior distributions).
pred.y.margThe marginal predictions of the longitudinal outcome.
pred.y.condThe conditional (given the random effects) predictions of the longitudinal outcome.
lineardeath.predThe linear predictor for the terminal part.
global_chisq_dThe vector with values of each multivariate Wald test for the terminal part.
dof_chisq_dThe vector with degrees of freedom for each multivariate Wald test for the terminal part.
global_chisq.test_dThe binary variable equals to 0 when no multivariate Wald is given, 1 otherwise (for the terminal part).
p.global_chisq_dThe vector with the p_values for each global multivariate Wald test for the terminal part.
global_chisqThe vector with values of each multivariate Wald test for the longitudinal part.
dof_chisqThe vector with degrees of freedom for each multivariate Wald test for the longitudinal part.
global_chisq.testThe binary variable equals to 0 when no multivariate Wald is given, 1 otherwise (for the longitudinal part).
p.global_chisqThe vector with the p_values for each global multivariate Wald test for the longitudinal part.
names.factordcThe names of the "as.factor" variables for the terminal part.
names.factorThe names of the "as.factor" variables for the longitudinal part.
interceptThe logical value. Is the fixed intercept included in the linear mixed-effects model?
B1The variance matrix of the random effects for the longitudinal outcome.
ResidualSEThe standard deviation of the measurement error.
etaThe regression coefficients for the link function.
ne_reThe number of random effects used in the fit.
names.reThe names of variables for the random effects.
linkThe name of the type of the link function.
leftCensoringThe logical value. Is the longitudinal outcome left-censored?
leftCensoring.thresholdFor the left-censored biomarker, the value of the left-censoring threshold used for the fit.
prop.censoredThe fraction of observations subjected to the left-censoring.
methodGHThe Guassian quadrature method used in the fit.
n.nodesThe number of nodes used for the Gaussian quadrature in the fit.

Details

Typical usage for the joint model longiPenal(Surv(time,event)~var1+var2, biomarker ~ var1+var2, data, data.Longi, ...)

The method of the Gauss-Hermite quadrature for approximations of the multidimensional integrals, i.e. length of random is 2, can be chosen among the standard, non-adaptive, pseudo-adaptive in which the quadrature points are transformed using the information from the fitted mixed-effects model for the biomarker (Rizopoulos 2012) or multivariate non-adaptive procedure proposed by Genz et al. 1996 and implemented in FORTRAN subroutine HRMSYM. The choice of the method is important for estimations. The standard non-adaptive Gauss-Hermite quadrature ("Standard") with a specific number of points gives accurate results but can be time consuming. The non-adaptive procedure ("HRMSYM") offers advantageous computational time but in case of datasets in which some individuals have few repeated observations (biomarker measures or recurrent events), this method may be moderately unstable. The pseudo-adaptive quadrature uses transformed quadrature points to center and scale the integrand by utilizing estimates of the random effects from an appropriate linear mixed-effects model. This method enables using less quadrature points while preserving the estimation accuracy and thus lead to a better computational time.

NOTE. Data frames data and data.Longi must be consistent. Names and types of corresponding covariates must be the same, as well as the number and identification of individuals.

References

A. Genz and B. Keister (1996). Fully symmetric interpolatory rules for multiple integrals over infinite regions with Gaussian weight. Journal of Computational and Applied Mathematics 71, 299-309.

A. Krol, L. Ferrer, JP. Pignon, C. Proust-Lima, M. Ducreux, O. Bouche, S. Michiels, V. Rondeau (2015). Joint Model for Left-Censored Longitudinal Data, Recurrent Events and Terminal Event: Predictive Abilities of Tumor Burden for Cancer Evolution with Application to the FFCD 2000-05 Trial. Submitted.

D. Rizopoulos (2012). Fast fitting of joint models for longitudinal and event time data using a pseudo-adaptive Gaussian quadrature rule. Computational Statistics and Data Analysis 56, 491-501.

M.S. Wulfsohn, A.A. and Tsiatis, A. A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics 53, 330-9.

Examples

Run this code

###--- Joint model for longitudinal data and a terminal event ---###

data(colorectal)
data(colorectalLongi)

# Survival data preparation - only terminal events 
colorectalSurv <- subset(colorectal, new.lesions == 0)

# Baseline hazard function approximated with splines
# Random effects as the link function

model.spli.RE <- longiPenal(Surv(time1, state) ~ age + treatment + who.PS 
+ prev.resection, tumor.size ~  year * treatment + age + who.PS ,
colorectalSurv,	data.Longi = colorectalLongi, random = c("1", "year"),
id = "id", link = "Random-effects", left.censoring = -3.33, 
n.knots = 7, kappa = 2)

# Weibull baseline hazard function
# Current level of the biomarker as the link function

model.weib.CL <- longiPenal(Surv(time1, state) ~ age + treatment + who.PS
+ prev.resection, tumor.size ~  year * treatment + age + who.PS , 
colorectalSurv, data.Longi = colorectalLongi, random = c("1", "year"),
id = "id", link = "Current-level", left.censoring = -3.33, hazard = "Weibull")

Run the code above in your browser using DataLab