lcmm
. It fits multivariate mixed models and multivariate latent class mixed models for multivariate longitudinal outcomes of different types. It handles continuous longitudinal outcomes (Gaussian or non-Gaussian, curvilinear) as well as bounded quantitative and discrete longitudinal outcomes. Next version will also handle ordinal outcomes.
The model assumes that all the outcomes measure the same underlying latent process defined as their common factor, and each outcome is related to this latent common factor by a specific parameterized link function.
At the latent process level, the model estimates a standard linear mixed model or a latent class linear mixed model when heterogeneity in the population is investigated (in the same way as in function hlme
).
Parameters of the nonlinear link functions and of the latent process mixed model are estimated simultaneously using a maximum likelihood method.multlcmm(fixed, mixture, random, subject, classmb, ng = 1,
idiag = FALSE,nwg = FALSE, randomY=FALSE, link = "linear",
intnodes = NULL, epsY = 0.5, cor=NULL, data, B, convB = 1e-04,
convL = 1e-04, convG = 1e-04, maxiter=100,
nsim=100, prior,range=NULL, subset=NULL, na.action=1,
posfix=NULL, partialH=FALSE)
+
on the left of ~
and the covariates are separated by +
fixed
, the covariates with class-speci+
.
By default, an intercept i+
.
No intercept should be included in this formula.ng=1
no mixture
nor classmb
should be specified. If ng>1
, mixture
is required.FALSE
, a non structured matrix of variance-covariance is considered (by default).
If TRUE
a diagonal matrix of variance-covariance is considered.FALSE
the variance-covariance matrix is common over latent classes (by default).
If TRUE
a class-specific proportional parameter multiplies the variFALSE
no outcome-specific random intercept is added (default). If TRUE
independent outcome-specific random intercepts with parameterized variance are includefixed
, mixture
, random
, classmb
and subject
.details
section).
(2) nothing is specifieddata
to use. By default, all lines.B
and detailed in section details
Best
with exception for variance-covariance parameters of the random-effects for which V
contains the variance-covariance estimates of the Cholesky transformed parameters displayed in cholesky
multlcmm
function estimates multivariate latent class mixed models for different types of outcomes by assuming a parameterized link function for linking each outcome Y_k(t) with the underlying latent common factor L(t) they measure. To fix the latent process dimension, we chose to constrain at the latent process level the (first) intercept of the latent class mixed model at 0 and the standard error of the first random effect at 1.
1. With the "linear" link function, 2 parameters are required for the following transformation (Y(t) - b1)/b2
2. With the "beta" link function, 4 parameters are required for the following transformation: [ h(Y(t)',b1,b2) - b3]/b4 where h is the Beta CDF with canonical parameters c1 and c2 that can be derived from b1 and b2 as c1=exp(b1)/[exp(b2)*(1+exp(b1))] and c2=1/[exp(b2)*(1+exp(b1))], and Y(t)' is the rescaled outcome i.e. Y(t)'= [ Y(t) - min(Y(t)) + epsY ] / [ max(Y(t)) - min(Y(t)) +2*epsY ].
3. With the "splines" link function, n+2 parameters are required for the following transformation b_1 + b_2*I_1(Y(t)) + ... + b_{n+2} I_{n+1}(Y(t)), where I_1,...,I_{n+1} is the basis of quadratic I-splines. To constraint the parameters to be positive, except for b_1, the program estimates b_k^* (for k=2,...,n+2) so that b_k=(b_k^*)^2. This parameterization may lead in some cases to problems of convergence that we are currently addressing.
Details of these parameterized link functions can be found in the papers: Proust-Lima et al. (Biometrics 2006) and Proust-Lima et al. (BJMSP 2013).
B. THE VECTOR OF PARAMETERS B
The parameters in the vector of initial values B
or in the vector of maximum likelihood estimates best
are included in the following order:
(1) ng-1 parameters are required for intercepts in the latent class membership model, and if covariates are included in classmb
, ng-1 paramaters should be entered for each one;
(2) for all covariates in fixed
, one parameter is required if the covariate is not in mixture
, ng paramaters are required if the covariate is also in mixture
; When ng=1, the intercept is not estimated and no parameter should be specified in B
. When ng>1, the first intercept is not estimated and only ng-1 parameters should be specified in B
;
(3) for all covariates included with contrast()
in fixed
, one supplementary parameter per outcome is required excepted for the last outcome for which the parameter is not estimated but deduced from the others;
(4) if idiag=TRUE
, the variance of each random-effect specified in random
is required excepted the first one (usually the intercept) which is constrained to 1.
(5) if idiag=FALSE
, the inferior triangular variance-covariance matrix of all the random-effects is required excepted the first variance (usually the intercept) which is constrained to 1.
(5) only if nwg=TRUE
and ng
>1, ng-1 parameters for class-specific proportional coefficients
for the variance covariance matrix of the random-effects;
(6) if cor
is specified, the standard error of the Brownian motion or the standard error and the correlation parameter of the autoregressive process;
(7) the standard error of the outcome-specific Gaussian errors (one per outcome);
(8) if randomY=TRUE
, the standard error of the outcome-specific random intercept (one per outcome);
(9) the parameters of each parameterized link function: 2 for "linear", 4 for "beta", n+2 for "splines" with n nodes.
We understand that it can be difficult to enter the correct number of parameters in B
at the first place. So we recommend to run the program without specifying the initial vector B
even if this model does not converge (maxiter can be even at 1 to run only 1 iteration of the optimization program). As the final vector best
has exactly the same structure as B
(even when the program stops without convergence), it will help defining a satisfying vector of initial values B
for next runs.
C. CAUTIONS REGARDING THE USE OF THE PROGRAM
Some caution should be made when using the program. Convergence criteria are very strict as they are based on the derivatives of the log-likelihood in addition to the parameter and log-likelihood stability. In some cases, the program may not converge and reach the maximum number of iterations fixed at 100. In this case, the user should check that parameter estimates at the last iteration are not on the boundaries of the parameter space.
If the parameters are on the boundaries of the parameter space, the identifiability of the model is critical. This may happen especially with splines parameters that may be too close to 0 (lower boundary) or classmb parameters that are too high or low (perfect classification). When identifiability of some parameters is suspected, the program can be run again from the former estimates by fixing the suspected parameters to their value with option posfix. This usually solves the problem. An alternative is to remove the parameters of the Beta of Splines link function from the inverse of the Hessian with option partialH.
If not, the program should be run again with other initial values, with a higher maximum number of iterations or less strict convergence tolerances.
Specifically when investigating heterogeneity (that is with ng>1):
(1) As the log-likelihood of a latent class model can have multiple maxima, a careful choice of the initial values is crucial for ensuring convergence toward the global maximum.
The program can be run without entering the vector of initial values (see point 2).
However, we recommend to systematically enter initial values in B
and try different sets of initial values.
(2) The automatic choice of initial values we provide requires the estimation of a preliminary linear mixed model. The user should be aware that first, this preliminary analysis can take time for large datatsets and second,
that the generated initial values can be very not likely and even may converge slowly to a local maximum.
This is a reason why specification of initial values in B
should be favoured.
Proust and Jacqmin-Gadda (2005). Estimation of linear mixed models with a mixture of distribution for the random-effects. Comput Methods Programs Biomed 78: 165-73.
Proust, Jacqmin-Gadda, Taylor, Ganiayre, and Commenges (2006). A nonlinear model with latent process for cognitive evolution using multivariate longitudinal data. Biometrics 62, 1014-24.
Proust-Lima, Dartigues and Jacqmin-Gadda (2011). Misuse of the linear mixed model when evaluating risk factors of cognitive decline. Amer J Epidemiol 174(9): 1077-88.
Proust-Lima, Amieva, Jacqmin-Gadda (2013). Analysis of multivariate mixed longitudinal data: A flexible latent process approach. Br J Math Stat Psychol 66(3): 470-87.
Commenges, Proust-Lima, Samieri, Liquet (2012). A universal approximate cross-validation criterion and its asymptotic distribution, Arxiv.
postprob
, plot.multlcmm
, predictL
, predictY
lcmm
data(data_Jointlcmm)
# Latent process mixed model for two curvilinear outcomes. Link functions are
# aproximated by I-splines, the first one has 3 nodes (i.e. 1 internal node 8),
# the second one has 4 nodes (i.e. 2 internal nodes 12,25)
m1 <- multlcmm(Ydep1+Ydep2~1+Time*X2+contrast(X2),random=~1+Time,
subject="ID",randomY=TRUE,link=c("4-manual-splines","3-manual-splines"),
intnodes=c(8,12,25),data=data_Jointlcmm)
# to reduce the computation time, the same model is estimated using
# a vector of initial values
m1 <- multlcmm(Ydep1+Ydep2~1+Time*X2+contrast(X2),random=~1+Time,
subject="ID",randomY=TRUE,link=c("4-manual-splines","3-manual-splines"),
intnodes=c(8,12,25),data=data_Jointlcmm,
B=c(-1.071, -0.192, 0.106, -0.005, -0.193, 1.012, 0.870, 0.881,
0.000, 0.000, -7.520, 1.401, 1.607 , 1.908, 1.431, 1.082,
-7.528, 1.135 , 1.454 , 2.328, 1.052))
# output of the model
summary(m1)
# estimated link functions
plot(m1,which="linkfunction")
# variation percentages explained by linear mixed regression
VarExpl(m1,data.frame(Time=0))
#### Heterogeneous latent process mixed model with linear link functions
#### and 2 latent classes of trajectory
m2 <- multlcmm(Ydep1+Ydep2~1+Time*X2,random=~1+Time,subject="ID",
link="linear",ng=2,mixture=~1+Time,classmb=~1+X1,data=data_Jointlcmm,
B=c( 18,-20.77,1.16,-1.41,-1.39,-0.32,0.16,-0.26,1.69,1.12,1.1,10.8,
1.24,24.88,1.89))
# summary of the estimation
summary(m2)
# posterior classification
postprob(m2)
# longitudinal predictions in the outcomes scales for a given profile of covariates
newdata <- data.frame(Time=seq(0,5,length=100),X1=rep(0,100),X2=rep(0,100),X3=rep(0,100))
predGH <- predictY(m2,newdata,var.time="Time",methInteg=0,nsim=20)
head(predGH)
Run the code above in your browser using DataLab