lmrm: Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data

Description

Implements an MCMC algorithm for posterior sampling based on a semiparametric model for continuous longitudinal multivariate responses. The overall model consists of 5 regression submodels and it utilizes spike-slab priors for variable selection and function regularization. See `Details' section for a full description of the model.

Usage

lmrm(formula, data = list(), centre=TRUE, id, time,
sweeps, burn = 0, thin = 1, seed, StorageDir,
c.betaPrior = "IG(0.5,0.5*n*p)", pi.muPrior = "Beta(1,1)", 
c.alphaPrior = "IG(1.1,1.1)", pi.phiPrior = "Beta(1,1)", c.psiPrior = "HN(2)",
sigmaPrior = "HN(2)", pi.sigmaPrior = "Beta(1,1)", 
corr.Model = c("common", nClust = 1), DP.concPrior = "Gamma(5,2)",
c.etaPrior = "IG(0.5,0.5*samp)", pi.nuPrior = "Beta(1,1)", 
pi.fiPrior = "Beta(1,1)", c.omegaPrior = "IG(1.1,1.1)", sigmaCorPrior = "HN(2)",
tuneCalpha, tuneSigma2, tuneCbeta, tuneAlpha, tuneSigma2R, tuneR, tuneCpsi, 
tuneCbCor, tuneOmega, tuneComega, tau, FT = 1,...)

Value

Function lmrm returns samples from the posteriors of the model parameters.

Arguments

formula: a formula defining the responses and the covariates in the 5 regression models e.g. y1 | y2 ~ x | w | z | t | t or for smooth effects y1 | y2 ~ sm(x) | sm(w) | sm(z) | sm(t) | sm(t). The package uses the extended formula notation, where the responses are defined on the left of ~ and the models on the right.
data: a data frame.
centre: Binary indicator. If set equal to TRUE, the design matrices are centred, to have column mean equl to zero, otherwise, if set to FALSE, the columns are not centred.
id: identifiers of the individuals or other sampling units that are observed over time.
time: a vector input that specifies the time of observation
sweeps: total number of posterior samples, including those discarded in burn-in period (see argument burn) and those discarded by the thinning process (see argument thin).
burn: length of burn-in period.
thin: thinning parameter.
seed: optional seed for the random generator.
StorageDir: a required directory to store files with the posterior samples of models parameters.
c.betaPrior: The inverse Gamma prior of $c_{\beta}$. The default is "IG(0.5,0.5*n*p)", that is, an inverse Gamma with parameters $1/2$ and $np/2$, where $n$ is the number of sampling units and $p$ is the length of the response vector.
pi.muPrior: The Beta prior of $\pi_{\mu}$. The default is "Beta(1,1)". It can be of dimension $1$, of dimension $K$ (the number of effects that enter the mean model), or of dimension $p K$
c.alphaPrior: The inverse Gamma prior of $c_{\alpha}^2$. The default is "IG(1.1,1.1)". Half-normal priors for $c_{\alpha}$ are also available, declared using "HN(a)", where "a" is a positive number. It can be of dimension $1$ or $p$ (the length of the multivariate response).
pi.phiPrior: The Beta prior of $\pi_{\phi}$. The default is "Beta(1,1)". It can be of dimension $1$, of dimension $B$ (the number of effects that enter the dependence model), or of dimension $p^2 B$
c.psiPrior: The prior of $c_{\psi}^2$. The default is "HN(2)", a half-normal prior for $c_{\psi}$ with variance equal to two, $c_{\psi} \sim N(0,2) I[c_{\psi} > 0]$. Inverse Gamma priors for $c_{\psi}^2$ are also available, declared using "IG(a,b)". It can be of dimension $1$ or $p^2$ (the number of dependence models).
sigmaPrior: The prior of $\sigma_k^2, k=1,\dots,p$. The default is "HN(2)", a half-normal prior for $\sigma_k$ with variance equal to two, $\sigma_k \sim N(0,2) I[\sigma>0]$. Inverse Gamma priors for $\sigma_k^2$ are also available, declared using "IG(a,b)". It can be of dimension $1$ or $p$ (the length of the multivariate response).
pi.sigmaPrior: The Beta prior of $\pi_{\sigma}$. The default is "Beta(1,1)". It can be of dimension $1$, of dimension $L$ (the number of effects that enter the variance model), or of dimension $pL$
corr.Model: Specifies the model for the correlation matrices $R_t$. The three choices supported are "common", that specifies a common correlations model, "groupC", that specifies a grouped correlations model, and "groupV", that specifies a grouped variables model. When the model chosen is either "groupC" or "groupV", the upper limit on the number of clusters can also be specified, using corr.Model = c("groupC", nClust = d) or corr.Model = c("groupV", nClust = p). If the number of clusters is left unspecified, for the "groupV" model, it is taken to be $p$, the number of responses. For the "groupC" model, it is taken to be $d = p(p-1)/2$, the number of free elements in the correlation matrices.
DP.concPrior: The Gamma prior for the Dirichlet process concentration parameter.
c.etaPrior: The inverse Gamma prior of $c_{\eta}$. The default is "IG(0.5,0.5*samp)", that is, an inverse Gamma with parameters $1/2$ and $samp/2$, where $samp$ is the number of correlations observed over time, that is $samp=M*d$ where $M$ is the number of unique observation time points and $d$ is the number of non-redundant elements of $R$.
pi.nuPrior: The Beta prior of $\pi_{\nu}$. The default is "Beta(1,1)". It can be of dimension $1$.
pi.fiPrior: The Beta prior of $\pi_{\varphi}$. The default is "Beta(1,1)". It can be of dimension $1$.
c.omegaPrior: The prior of $c_{\omega}^2$. The default is "HN(2)", a half-normal prior for $c_{\omega}$ with variance equal to two, $c_{\omega} \sim N(0,2) I[c_{\omega} > 0]$. Inverse Gamma priors for $c_{\omega}^2$ are also available, declared using "IG(a,b)". It can be of dimension $1$.
sigmaCorPrior: The prior of $\sigma^2$. The default is "HN(2)", a half-normal prior for $\sigma^2$ with variance equal to two, $\sigma \sim N(0,2) I[\sigma > 0]$. Inverse Gamma priors for $\sigma^2$ are also available, declared using "IG(a,b)". It can be of dimension $1$.
tuneCalpha: Starting value of the tuning parameter for sampling $c_{\alpha k}, k=1,\dots,p$. Defaults at a vector of $p$ ones. It could be of dimension $p$.
tuneSigma2: Starting value of the tuning parameter for sampling $\sigma^2_k, k=1,\dots,p$. Defaults at a vector of $p$ ones. It could be of dimension $p$.
tuneCbeta: Starting value of the tuning parameter for sampling $c_{\beta}$. Defaults at 100.
tuneAlpha: Starting value of the tuning parameter for sampling regression coefficients of the variance models $\alpha_k, k=1,\dots,p$. Defaults at a vector of 5s. It could be of dimension $L p$
tuneSigma2R: Starting value of the tuning parameter for sampling $\sigma^2$. Defaults at 1.
tuneR: Starting value of the tuning parameter for sampling correlation matrices. Defaults at $40*(p+2)^3$. Can be of dimension $1$ or $M$ is the number of unique observation time points.
tuneCpsi: Starting value of the tuning parameter for sampling variances $c_{\psi}^2$. Defaults at 5. Can be of dimension $1$ or $p^2$

tuneCbCor: Starting value of the tuning parameter for sampling $c_{\eta}^2$. Defaults at 10.
tuneOmega: Starting value of the tuning parameter for sampling regression coefficients of the variance models $\omega$. Defaults at 5.
tuneComega: Starting value of the tuning parameter for sampling $c_{\omega}$. Defaults at 1.
tau: The $tau$ of the shadow prior. Defaults at 0.01.
FT: Binary indicator. If set equal to 1, the Fisher's z transform of the correlations is modelled, otherwise if set equal to 0, the untransformed correlations are modelled.
...: Other options that will be ignored.

Author

Georgios Papageorgiou gpapageo@gmail.com

Details

Function lmrm returns samples from the posterior distributions of the parameters of a regression model with normally distributed multivariate longitudinal responses. To describe the model, let $Y_{ij} = (Y_{ij1},\dots,Y_{ijp})^{\top}$ denote the vector of $p$ responses observed on individual $i, i=1,\dots,n,$ at time point $t_{ij}, j=1,\dots,n_i$. The observational time points $t_{ij}$ are allowed to be unequally spaced. Further, let $u_{ij}$ denote the covariate vector that is observed along with $Y_{ij}$ and that may include time, other time-dependent covariates and time-independent ones. In addition, let $Y_{i}=(Y_{i1}^{\top},\dots,Y_{in_i}^{\top})^{\top}$ denote the $i$th response vector. With $\mu_i=E(Y_{i})$ and $\Sigma_i = $ cov$(Y_i)$, the model assumes multivariate normality, $Y_{i} \sim N(\mu_i, \Sigma_i), i=1,2,\dots,n$. The means $\mu_i$ and covariance matrices $\Sigma_i$ are modelled semiparametrically in terms of covariates. For the means one can specify semiparametric models, $$\mu_{ijk} = \beta_{k0} + \sum_{l=1}^{K_1} u_{ijl} \beta_{kl} + \sum_{l=K_1+1}^{K} f_{\mu,k,l}(u_{ijl}).$$ This is the first of the 5 regression submodels.

To model the covariance matrix, first consider the modified Cholesky decomposition, $L_i \Sigma_i L_i^{\top} = D_i,$ where $L_i$ is a unit block lower triangular matrix and $D_i$ is a block diagonal matrix, $$ \begin{array}{cc} L_i = \left[ \begin{array}{cccc} I & 0 & \dots & 0 \\ -\Phi_{i21} & I & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ -\Phi_{in_i1} & -\Phi_{in_i1} & \dots & I \\ \end{array} \right], & D_i = \left[ \begin{array}{cccc} D_{1} & 0 & \dots & 0 \\ 0 & D_{2} & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & D_{n_i} \\ \end{array} \right], \end{array} $$

For modelling $D_{ij}, i=1,\dots,n, j=1,\dots,n_i$ in terms of covariates, first we separate the variances and the correlations $D_{ij} = S_{ij}^{1/2} R_{ij} S_{ij}^{1/2}$. It is easy to model matrix $S_{ij}$ in terms of covariates as the only requirement on its diagonal elements is that they are nonnegative, $$\log \sigma^2_{ijk} = \alpha_{k0} + \sum_{l=1}^{L_1} w_{ijl} \alpha_{kl} + \sum_{l=L_1+1}^{L} f_{\sigma,k,l}(w_{ijl})$$ This is the second of the 5 regression submodels.

For $\phi_{ijklm}$, the $(l,m)$ element of $\Phi_{ijk}, l,m=1,\dots,p$, one can specify semiparametric models $$ \phi_{ijklm} = \psi_{lm0} + \sum_{b=1}^{B_1} v_{ijkb} \psi_{lmb} + \sum_{b=B_1+1}^{B} f_{\phi,l,m,b}(v_{ijkb}) $$ This is the third of the 5 regression submodels.

The elements of the correlations matrices $R_{ij}$ are modelled in terms of covariate time only, hence they are denoted by $R_t$. Subject to the positive definiteness constraint, the elements of $R_t$ are modelled using a normal distribution with location and scale parameters, $\mu_{ct}$ and $\sigma^2_{ct}$, modelled as $$ \mu_{ct} = \eta_0 + f_{\mu}(t), $$ $$ \log \sigma^2_{ct} = \omega_0 + f_{\sigma}(t), $$ and these are the last 2 of the 5 submodels.

References

Papageorgiou, G. (2020). Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data. arXiv:2012.09833.

Examples

Run this code

	# Fit a joint mean-covariance model on the simulated dataset simD2
	require(ggplot2)
	data(simD2)
	model <- Y1 | Y2 ~ time | sm(time) | sm(lag) | sm(time) | 1 
	# the above defines the responses and the regression models on the left and 
	# right of "~", respectively 
	# the first model, for the mean, is a linear function of time, this is sufficient as 
	# the 2 responses have constant mean.
	# the second model, for the variances, is a smooth function of time
	# the third model, for the dependence structure, is a smooth function of lag, 
	# that lmrm figures out and it does not need to be computed by the user
	# the fourth model, for location of the correlations, is a smooth function of time
	# the fifth model, for scale of the correlations, is just an intercept model
	if (FALSE) {
	m1 <- lmrm(formula = model, corr.Model = c("common", nClust = 1), data = simD2,
		       id = id, time = time, sweeps = 2500, burn = 500, thin = 2, 
		       StorageDir = getwd(), seed = 1)
	plot(m1)
	}

Run the code above in your browser using DataLab