trivPenal: Fit a Trivariate Joint Model for Longitudinal Data, Recurrent Events and a Terminal Event

Description

Fit a trivariate joint model for longitudinal data, recurrent events and a terminal event using a semiparametric penalized likelihood estimation or a parametric estimation on the hazard functions.

The longitudinal outcomes y(t) (k=1,...,n, i=1,...,N) for N subjects are described by a linear mixed model and the risks of the recurrent and terminal events are represented by proportional hazard risk models. The joint model is constructed assuming that the processes are linked via a latent structure (Krol et al. 2015):

where $X$(t), $X$(t) and $X$(t) are vectors of fixed effects covariates and $\beta$, $\beta$ and $\beta$ are the associated coefficients. Measurements errors $\epsilon$(t) are iid normally distributed with mean 0 and variance $\sigma$. The random effects $b$ = ($b$,...,$b$) $N$(0,$B$) are associated to covariates $Z$(t) and independent from the measurement error. The relationship between the biomarker and recurrent events is explained via g($b$,$\beta$,$Z$(t),$X$(t)) with coefficients $\eta$ and between the biomarker and terminal event is explained via h($b$,$\beta$,$Z$(t),$X$(t)) with coefficients $\eta$. Two forms of the functions g(.) and h(.) are available: the random effects $b$ and the current biomarker level $m$(t)=$X$(t)$\beta$ + $Z$(t)$b$. The frailty term v is gaussian with mean 0 and variance $\sigma$. Together with $b$ constitutes the random effects of the model:

We consider that the longitudinal outcome can be a subject to a quantification limit, i.e. some observations, below a level of detection $s$ cannot be quantified (left-censoring).

Usage

trivPenal(formula, formula.terminalEvent, formula.LongitudinalData, data,
data.Longi, random, id, intercept = TRUE, link = "Random-effects",
left.censoring = FALSE, recurrentAG = FALSE, n.knots, kappa, maxit = 300,
hazard = "Splines", init.B, init.Random, init.Eta, init.Alpha, method.GH =
"Standard", n.nodes, LIMparam = 1e-3, LIMlogl = 1e-3, LIMderiv = 1e-3,
print.times = TRUE)

Arguments

formula

a formula object, with the response on the left of a $\sim$ operator, and the terms on the right. The response must be a survival object as returned by the 'Surv' function like in survival package. Interactions are possible using * or :.

formula.terminalEvent

A formula object, only requires terms on the right to indicate which variables are modelling the terminal event. Interactions are possible using * or :.

formula.LongitudinalData

A formula object, only requires terms on the right to indicate which variables are modelling the longitudinal outcome. It must follow the standard form used for linear mixed-effects models. Interactions are possible using * or :.

data

A 'data.frame' with the variables used in formula.

data.Longi

A 'data.frame' with the variables used in formula.LongitudinalData.

random

Names of variables for the random effects of the longitudinal outcome. Maximum 3 random effects are possible at the moment. The random intercept is chosen using "1".

Name of the variable representing the individuals.

intercept

Logical value. Is the fixed intercept of the biomarker included in the mixed-effects model? The default is TRUE.

link

Type of link functions for the dependence between the biomarker and death and between the biomarker and the recurrent events: "Random-effects" for the association directly via the random effects of the biomarker, "Current-level" for the association via the true current level of the biomarker. The option "Current-level" can be chosen only if the biomarker random effects are associated with the intercept and time (following this order). The default is "Random-effects".

left.censoring

Is the biomarker left-censored below a threshold $s$? If there is no left-censoring, the argument must be equal to FALSE, otherwise the value of the threshold must be given.

recurrentAG

Logical value. Is Andersen-Gill model fitted? If so indicates that recurrent event times with the counting process approach of Andersen and Gill is used. This formulation can be used for dealing with time-dependent covariates. The default is FALSE.

n.knots

Integer giving the number of knots to use. Value required in the penalized likelihood estimation. It corresponds to the (n.knots+2) splines functions for the approximation of the hazard or the survival functions. We estimate I or M-splines of order 4. When the user set a number of knots equals to k (n.knots=k) then the number of interior knots is (k-2) and the number of splines is (k-2)+order. Number of knots must be between 4 and 20. (See Note in frailtyPenal function)

kappa

Positive smoothing parameters in the penalized likelihood estimation. The coefficient kappa of the integral of the squared second derivative of hazard function in the fit (penalized log likelihood). To obtain an initial value for kappa, a solution is to fit the corresponding Cox model using cross validation (See cross.validation in function frailtyPenal). We advise the user to identify several possible tuning parameters, note their defaults and look at the sensitivity of the results to varying them.

maxit

Maximum number of iterations for the Marquardt algorithm. Default is 300

hazard

Type of hazard functions: "Splines" for semiparametric hazard functions using equidistant intervals or "Splines-per" using percentile with the penalized likelihood estimation, "Weibull" for the parametric Weibull functions. The default is "Splines".

init.B

Vector of initial values for regression coefficients. This vector should be of the same size as the whole vector of covariates with the first elements for the covariates related to the recurrent events, then to the terminal event and then to the biomarker (interactions in the end of each component). Default is 0.5 for each.

init.Random

Initial value for variance of the elements of the matrix of the distribution of the random effects.

init.Eta

Initial values for regression coefficients for the link functions, first for the recurrent events ($\bold{\eta}_R$) and for the terminal event ($\bold{\eta}_T$).

init.Alpha

Initial value for parameter alpha

method.GH

Method for the Gauss-Hermite quadrature: "Standard" for the standard non-adaptive Gaussian quadrature, "Pseudo-adaptive" for the pseudo-adaptive Gaussian quadrature and "HRMSYM" for the algorithm for the multivariate non-adaptive Gaussian quadrature (see Details). The default is "Standard".

n.nodes

Number of nodes for the Gauss-Hermite quadrature. They can be chosen amon 5, 7, 9, 12, 15, 20 and 32. The default is 9.

LIMparam

Convergence threshold of the Marquardt algorithm for the parameters (see Details), $10^{-3}$ by default.

LIMlogl

Convergence threshold of the Marquardt algorithm for the log-likelihood (see Details), $10^{-3}$ by default.

LIMderiv

Convergence threshold of the Marquardt algorithm for the gradient (see Details), $10^{-3}$ by default.

print.times

a logical parameter to print iteration process. Default is TRUE.

Value

The following components are included in a 'trivPenal' object for each model:

The sequence of the corresponding estimation of the coefficients for the hazard functions (parametric or semiparametric), the random effects variances and the regression coefficients.

call

The code used for the model.

formula

The formula part of the code used for the recurrent event part of the model.

formula.terminalEvent

The formula part of the code used for the terminal event part of the model.

formula.LongitudinalData

The formula part of the code used for the longitudinal part of the model.

coef

The regression coefficients (first for the recurrent events, then for the terminal event and then for the biomarker.

groups

The number of groups used in the fit.

kappa

The values of the smoothing parameters in the penalized likelihood estimation corresponding to the baseline hazard functions for the recurrent and terminal events.

logLikPenal

The complete marginal penalized log-likelihood in the semiparametric case.

logLik

The marginal log-likelihood in the parametric case.

n.measurements

The number of biomarker observations used in the fit.

max_rep

The maximal number of repeated measurements per individual.

The number of observations in 'data' (recurrent and terminal events) used in the fit.

n.events

The number of recurrent events observed in the fit.

n.deaths

The number of terminal events observed in the fit.

n.iter

The number of iterations needed to converge.

n.knots

The number of knots for estimating the baseline hazard function in the penalized likelihood estimation.

n.strat

The number of stratum.

varH

The variance matrix of all parameters (before positivity constraint transformation for the variance of the measurement error, for which the delta method is used).

varHIH

The robust estimation of the variance matrix of all parameters.

The vector of times where both survival and hazard function of the recurrent events are estimated. By default seq(0,max(time),length=99), where time is the vector of survival times.

lamR

The array (dim=3) of baseline hazard estimates and confidence bands (recurrent events).

survR

The array (dim=3) of baseline survival estimates and confidence bands (recurrent events).

The vector of times where both survival and hazard function of the terminal event are estimated. By default seq(0,max(time),length=99), where time is the vector of survival times.

lamD

The array (dim=3) of baseline hazard estimates and confidence bands.

survD

The array (dim=3) of baseline survival estimates and confidence bands.

medianR

The value of the median survival and its confidence bands for the recurrent event.

medianD

The value of the median survival and its confidence bands for the terminal event.

typeof

The type of the baseline hazard function (0:"Splines", "2:Weibull").

npar

The number of parameters.

nvar

The vector of number of explanatory variables for the recurrent events, terminal event and biomarker.

nvarRec

The number of explanatory variables for the recurrent events.

nvarEnd

The number of explanatory variables for the terminal event.

nvarY

The number of explanatory variables for the biomarker.

noVarRec

The indicator of absence of the explanatory variables for the recurrent events.

noVarEnd

The indicator of absence of the explanatory variables for the terminal event.

noVarY

The indicator of absence of the explanatory variables for the biomarker.

LCV

The approximated likelihood cross-validation criterion in the semiparametric case (with H minus the converged Hessian matrix, and l(.) the full log-likelihood).$$LCV=\frac{1}{n}(trace(H^{-1}_{pl}H) - l(.))$$

AIC

The Akaike information Criterion for the parametric case.$$AIC=\frac{1}{n}(np - l(.))$$

n.knots.temp

The initial value for the number of knots.

shape.weib

The shape parameter for the Weibull hazard functions (the first element for the recurrences and the second one for the terminal event).

scale.weib

The scale parameter for the Weibull hazard functions (the first element for the recurrences and the second one for the terminal event).

martingale.res

The martingale residuals related to the recurrences for each individual.

martingaledeath.res

The martingale residuals related to the terminal event for each individual.

conditional.res

The conditional residuals for the biomarker (subject-specific): $\bold{R}_i^{(m)}=\bold{y}_i-\bold{X}_{Li}^\top\widehat{\bold{\beta}}_L-\bold{Z}_i^\top\widehat{\bold{b}}_i$.

marginal.res

The marginal residuals for the biomarker (population averaged): $\bold{R}_i^{(c)}=\bold{y}_i-\bold{X}_{Li}^\top\widehat{\bold{\beta}}_L$.

marginal_chol.res

The Cholesky marginal residuals for the biomarker: $\bold{R}_i^{(m)}=\widehat{\bold{U}_i^{(m)}}\bold{R}_i^{(m)}$, where $\widehat{\bold{U}_i^{(m)}}$ is an upper-triangular matrix obtained by the Cholesky decomposition of the variance matrix $\bold{V}_{\bold{R}_i^{(m)}}=\widehat{\bold{V}_i}-\bold{X}_{Li}(\sum_{i=1}^N\bold{X}_{Li}\widehat{\bold{V}_i}^{-1}\bold{X}_{Li})^{-1}\bold{X}_{Li}^\top$.

conditional_st.res

The standardized conditional residuals for the biomarker.

marginal_st.res

The standardized marginal residuals for the biomarker.

random.effects.pred

The empirical Bayes predictions of the random effects (ie. using conditional posterior distributions).

frailty.pred

The empirical Bayes predictions of the frailty term (ie. using conditional posterior distributions).

pred.y.marg

The marginal predictions of the longitudinal outcome.

pred.y.cond

The conditional (given the random effects) predictions of the longitudinal outcome.

linear.pred

The linear predictor for the recurrent events part.

lineardeath.pred

The linear predictor for the terminal event part.

global_chisqR

The vector with values of each multivariate Wald test for the recurrent part.

dof_chisqR

The vector with degrees of freedom for each multivariate Wald test for the recurrent part.

global_chisq.testR

The binary variable equals to 0 when no multivariate Wald is given, 1 otherwise (for the recurrent part).

p.global_chisqR

The vector with the p_values for each global multivariate Wald test for the recurrent part.

global_chisqT

The vector with values of each multivariate Wald test for the terminal part.

dof_chisqT

The vector with degrees of freedom for each multivariate Wald test for the terminal part.

global_chisq.testT

The binary variable equals to 0 when no multivariate Wald is given, 1 otherwise (for the terminal part).

p.global_chisqT

The vector with the p_values for each global multivariate Wald test for the terminal part.

global_chisqY

The vector with values of each multivariate Wald test for the longitudinal part.

dof_chisqY

The vector with degrees of freedom for each multivariate Wald test for the longitudinal part.

global_chisq.testY

The binary variable equals to 0 when no multivariate Wald is given, 1 otherwise (for the longitudinal part).

p.global_chisqY

The vector with the p_values for each global multivariate Wald test for the longitudinal part.

names.factorR

The names of the "as.factor" variables for the recurrent part.

names.factorT

The names of the "as.factor" variables for the terminal part.

names.factorY

The names of the "as.factor" variables for the longitudinal part.

The logical value. Is Andersen-Gill model fitted?

intercept

The logical value. Is the fixed intercept included in the linear mixed-effects model?

The variance matrix of the random effects for the longitudinal outcome.

sigma2

The variance of the frailty term ($\sigma_v$).

alpha

The coefficient $\alpha$ associated with the frailty parameter in the terminal hazard function.

ResidualSE

The variance of the measurement error.

etaR

The regression coefficients for the link function $g(\cdot)$.

etaT

The regression coefficients for the link function $h(\cdot)$.

ne_re

The number of random effects b used in the fit.

names.re

The names of variables for the random effects $\bold{b}_i$.

link

The name of the type of the link functions.

leftCensoring

The logical value. Is the longitudinal outcome left-censored?

leftCensoring.threshold

For the left-censored biomarker, the value of the left-censoring threshold used for the fit.

prop.censored

The fraction of observations subjected to the left-censoring.

methodGH

The Gaussian quadrature method used in the fit.

n.nodes

The number of nodes used for the Gaussian quadrature in the fit.

alpha_p.value

p-value of the Wald test for the estimated coefficient $\alpha$.

sigma2_p.value

p-value of the Wald test for the estimated variance of the frailty term ($\sigma_v$).

etaR_p.value

p-values of the Wald test for the estimated regression coefficients for the link function $g(\cdot)$.

etaT_p.value

p-values of the Wald test for the estimated regression coefficients for the link function $h(\cdot)$.

beta_p.value

p-values of the Wald test for the estimated regression coefficients.

Details

Typical usage for the joint model

trivPenal(Surv(time,event)~cluster(id) + var1 + var2 +
terminal(death), formula.terminalEvent =~ var1 + var3, biomarker ~
var1+var2, data, data.Longi, ...)

The method of the Gauss-Hermite quadrature for approximations of the multidimensional integrals, i.e. length of random is 2, can be chosen among the standard, non-adaptive, pseudo-adaptive in which the quadrature points are transformed using the information from the fitted mixed-effects model for the biomarker (Rizopoulos 2012) or multivariate non-adaptive procedure proposed by Genz et al. 1996 and implemented in FORTRAN subroutine HRMSYM. The choice of the method is important for estimations. The standard non-adaptive Gauss-Hermite quadrature ("Standard") with a specific number of points gives accurate results but can be time consuming. The non-adaptive procedure ("HRMSYM") offers advantageous computational time but in case of datasets in which some individuals have few repeated observations (biomarker measures or recurrent events), this method may be moderately unstable. The pseudo-adaptive quadrature uses transformed quadrature points to center and scale the integrand by utilizing estimates of the random effects from an appropriate linear mixed-effects model (this transformation does not include the frailty in the trivariate model, for which the standard method is used). This method enables using less quadrature points while preserving the estimation accuracy and thus lead to a better computational time.

NOTE. Data frames data and data.Longi must be consistent. Names and types of corresponding covariates must be the same, as well as the number and identification of individuals.

References

A. Krol, A. Mauguen, Y. Mazroui, A. Laurent, S. Michiels and V. Rondeau (2017). Tutorial in Joint Modeling and Prediction: A Statistical Software for Correlated Longitudinal Outcomes, Recurrent Events and a Terminal Event. Journal of Statistical Software 81(3), 1-52.

A. Krol, L. Ferrer, JP. Pignon, C. Proust-Lima, M. Ducreux, O. Bouche, S. Michiels, V. Rondeau (2016). Joint Model for Left-Censored Longitudinal Data, Recurrent Events and Terminal Event: Predictive Abilities of Tumor Burden for Cancer Evolution with Application to the FFCD 2000-05 Trial. Biometrics 72(3) 907-16.

D. Rizopoulos (2012). Fast fitting of joint models for longitudinal and event time data using a pseudo-adaptive Gaussian quadrature rule. Computational Statistics and Data Analysis 56, 491-501.

A. Genz and B. Keister (1996). Fully symmetric interpolatory rules for multiple integrals over infinite regions with Gaussian weight. Journal of Computational and Applied Mathematics 71, 299-309.

Examples

Run this code

# NOT RUN {

# }
# NOT RUN {
###--- Trivariate joint model for longitudinal data, ---###
###--- recurrent events and a terminal event ---###

data(colorectal)
data(colorectalLongi)

# Parameter initialisation for covariates - longitudinal and terminal part

# Survival data preparation - only terminal events 
colorectalSurv <- subset(colorectal, new.lesions == 0)

initial.longi <- longiPenal(Surv(time1, state) ~ age + treatment + who.PS 
+ prev.resection, tumor.size ~  year * treatment + age + who.PS ,
colorectalSurv,	data.Longi = colorectalLongi, random = c("1", "year"),
id = "id", link = "Random-effects", left.censoring = -3.33, 
n.knots = 6, kappa = 2, method.GH="Pseudo-adaptive",
 maxit=40, n.nodes=7)


# Parameter initialisation for covariates - recurrent part
initial.frailty <- frailtyPenal(Surv(time0, time1, new.lesions) ~ cluster(id)
+ age + treatment + who.PS, data = colorectal,
recurrentAG = TRUE, RandDist = "LogN", n.knots = 6, kappa =2)


# Baseline hazard function approximated with splines
# Random effects as the link function, Calendar timescale
# (computation takes around 40 minutes)

model.spli.RE.cal <-trivPenal(Surv(time0, time1, new.lesions) ~ cluster(id)
+ age + treatment + who.PS +  terminal(state),
formula.terminalEvent =~ age + treatment + who.PS + prev.resection, 
tumor.size ~ year * treatment + age + who.PS, data = colorectal, 
data.Longi = colorectalLongi, random = c("1", "year"), id = "id", 
link = "Random-effects", left.censoring = -3.33, recurrentAG = TRUE,
n.knots = 6, kappa=c(0.01, 2), method.GH="Standard", n.nodes = 7,
init.B = c(-0.07, -0.13, -0.16, -0.17, 0.42, #recurrent events covariates
-0.16, -0.14, -0.14, 0.08, 0.86, -0.24, #terminal event covariates
2.93, -0.28, -0.13, 0.17, -0.41, 0.23, 0.97, -0.61)) #biomarker covariates



# Weibull baseline hazard function
# Random effects as the link function, Gap timescale
# (computation takes around 30 minutes)
model.weib.RE.gap <-trivPenal(Surv(gap.time, new.lesions) ~ cluster(id)
+ age + treatment + who.PS + prev.resection + terminal(state),
formula.terminalEvent =~ age + treatment + who.PS + prev.resection, 
tumor.size ~ year * treatment + age + who.PS, data = colorectal,
data.Longi = colorectalLongi, random = c("1", "year"), id = "id", 
link = "Random-effects", left.censoring = -3.33, recurrentAG = FALSE,
hazard = "Weibull", method.GH="Pseudo-adaptive",n.nodes=7)


# }
# NOT RUN {

# }

Run the code above in your browser using DataLab