Let \(Y_1,\ldots,Y_k,\ldots,Y_K\) be the considered indicators and \(\mbox{y}_{i,t}=(y_{i,t,1},\ldots,y_{i,t,k},\ldots,y_{i,t,K})'\) denote their observation on unit \(i\) (\(i=1,\ldots,n\)) at time \(t\) (\(t=1,\ldots,T\)).
Also, let \(\bar{y}_{i,k}\) and \(s_{i,k}\) be, respectively, sample mean and sample standard deviation of indicator \(Y_k\) for unit \(i\) across the whole period of observation.
Each indicator is normalized within units according to one among the following normalisation methods:
0) no normalisation:
$$y^*_{i,t,k}=y_{i,t,k}$$
1) centering:
$$y^*_{i,t,k}=y_{i,t,k}-\bar{y}_{i,k}$$
2) standardization:
$$y^*_{i,t,k}=\frac{y_{i,t,k}-\bar{y}_{i,k}}{s_{i,k}}$$
3) ratio to the mean:
$$y^*_{i,t,k}=\frac{y_{i,t,k}}{\bar{y}_{i,k}}$$
4) logarithmic ratio to the mean:
$$y^*_{i,t,k}=\log\left(\frac{y_{i,t,k}}{\bar{y}_{i,k}}\right)\approx\frac{y_{i,t,k}-\bar{y}_{i,k}}{\bar{y}_{i,k}}$$
Normalisation is required if the trajectories have different levels across units. When indicators have different scales of measurement, standardization is needed to compare the measurements of different indicators. Ratio to the mean and logaritmic ratio to the mean allow comparisons among different indicators as well, but they can be applied only in case of strictly positive measurements.
Denote the hypothesized groups as \(j=1,\ldots,J\) and let \(G_i\) be a latent variable taking value \(j\) if unit \(i\) belongs to group \(j\).
A group-based multivariate trajectory model with polynomial degree \(d\) is defined as:
$$\mbox{y}^*_{i,t}\mid G_i=j\sim\mbox{MVN}\left(\mu_j,\Sigma_j\right)\hspace{.9cm}j=1,\ldots,J$$
$$\mu_j=\mbox{B}_j'\left(1,t,t^2,\ldots,t^d\right)'$$
where \(\mbox{B}_j\) is the \((d+1)\times K\) matrix of regression coefficients in group \(j\), and \(\Sigma_j\) is the \(K \times K\) covariance matrix of the indicators in group \(j\).
The likelihood of the model is:
$$\mathcal{L}(\mbox{B}_1,\ldots,\mbox{B}_J,\Sigma_1,\ldots,\Sigma_J,\pi_1,\ldots,\pi_J)=\prod_{i=1}^n\left[\sum_{j=1}^J\pi_j \prod_{t=1}^T\phi(\mbox{y}^*_{i,t}\mid \mbox{B}_j,\Sigma_j)\right]$$
where \(\phi(\mbox{y}^*_{i,t}\mid \mbox{B}_j,\Sigma_j)\) is the multivariate Normal density of \(\mbox{y}^*_{i,t}\) in group \(j\), and \(\pi_j\) is the prior probability of group \(j\).
The posterior probability of group \(j\) for unit \(i\) is computed as:
$$\mbox{Pr}(G_i=j\mid \mbox{y}^*_i)\equiv\pi_{i,j}=\frac{\widehat{\pi}_j \prod_{t=1}^{T}\phi(\mbox{y}^*_{i,t}\mid \widehat{\mbox{B}}_j,\widehat{\Sigma}_j)}{\sum_{j=1}^J\widehat{\pi}_j \prod_{t=1}^{T}\phi(\mbox{y}^*_{i,t}\mid \widehat{\mbox{B}}_j,\widehat{\Sigma}_j)}$$
where the hat symbol above a parameter denotes the estimated value for that parameter.
See the vignette of the package and Magrini (2022) for details on maximum likelihood estimation through the EM algorithm.
S3 methods available for class gbmt include:
print: to see the estimated regression coefficients for each group;
summary: to obtain the summary of the linear regressions (a list with one component for each group and each indicator);
plot: to display estimated and predicted trajectories. See plot.gbmt for details;
coef: to see the estimated coefficients (a list with one component for each group);
fitted: to obtain the fitted values, equating to the estimated group trajectories (a list with one component for each group);
residuals: to obtain the residuals (a list with one component for each group);
predict: to perform prediction on trajectories. See predict.gbmt for details.
logLik: to get the log likelihood;
AIC, extractAIC: to get the Akaike information criterion;
BIC: to get the Bayesian information criterion.