Learn R Programming

BNSP (version 2.2.3)

lmrm: Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data

Description

Implements an MCMC algorithm for posterior sampling based on a semiparametric model for continuous longitudinal multivariate responses. The overall model consists of 5 regression submodels and it utilizes spike-slab priors for variable selection and function regularization. See `Details' section for a full description of the model.

Usage

lmrm(formula, data = list(), centre=TRUE, id, time,
sweeps, burn = 0, thin = 1, seed, StorageDir,
c.betaPrior = "IG(0.5,0.5*n*p)", pi.muPrior = "Beta(1,1)", 
c.alphaPrior = "IG(1.1,1.1)", pi.phiPrior = "Beta(1,1)", c.psiPrior = "HN(2)",
sigmaPrior = "HN(2)", pi.sigmaPrior = "Beta(1,1)", 
corr.Model = c("common", nClust = 1), DP.concPrior = "Gamma(5,2)",
c.etaPrior = "IG(0.5,0.5*samp)", pi.nuPrior = "Beta(1,1)", 
pi.fiPrior = "Beta(1,1)", c.omegaPrior = "IG(1.1,1.1)", sigmaCorPrior = "HN(2)",
tuneCalpha, tuneSigma2, tuneCbeta, tuneAlpha, tuneSigma2R, tuneR, tuneCpsi, 
tuneCbCor, tuneOmega, tuneComega, tau, FT = 1,...)

Value

Function lmrm returns samples from the posteriors of the model parameters.

Arguments

formula

a formula defining the responses and the covariates in the 5 regression models e.g. y1 | y2 ~ x | w | z | t | t or for smooth effects y1 | y2 ~ sm(x) | sm(w) | sm(z) | sm(t) | sm(t). The package uses the extended formula notation, where the responses are defined on the left of ~ and the models on the right.

data

a data frame.

centre

Binary indicator. If set equal to TRUE, the design matrices are centred, to have column mean equl to zero, otherwise, if set to FALSE, the columns are not centred.

id

identifiers of the individuals or other sampling units that are observed over time.

time

a vector input that specifies the time of observation

sweeps

total number of posterior samples, including those discarded in burn-in period (see argument burn) and those discarded by the thinning process (see argument thin).

burn

length of burn-in period.

thin

thinning parameter.

seed

optional seed for the random generator.

StorageDir

a required directory to store files with the posterior samples of models parameters.

c.betaPrior

The inverse Gamma prior of \(c_{\beta}\). The default is "IG(0.5,0.5*n*p)", that is, an inverse Gamma with parameters \(1/2\) and \(np/2\), where \(n\) is the number of sampling units and \(p\) is the length of the response vector.

pi.muPrior

The Beta prior of \(\pi_{\mu}\). The default is "Beta(1,1)". It can be of dimension \(1\), of dimension \(K\) (the number of effects that enter the mean model), or of dimension \(p K\)

c.alphaPrior

The inverse Gamma prior of \(c_{\alpha}^2\). The default is "IG(1.1,1.1)". Half-normal priors for \(c_{\alpha}\) are also available, declared using "HN(a)", where "a" is a positive number. It can be of dimension \(1\) or \(p\) (the length of the multivariate response).

pi.phiPrior

The Beta prior of \(\pi_{\phi}\). The default is "Beta(1,1)". It can be of dimension \(1\), of dimension \(B\) (the number of effects that enter the dependence model), or of dimension \(p^2 B\)

c.psiPrior

The prior of \(c_{\psi}^2\). The default is "HN(2)", a half-normal prior for \(c_{\psi}\) with variance equal to two, \(c_{\psi} \sim N(0,2) I[c_{\psi} > 0]\). Inverse Gamma priors for \(c_{\psi}^2\) are also available, declared using "IG(a,b)". It can be of dimension \(1\) or \(p^2\) (the number of dependence models).

sigmaPrior

The prior of \(\sigma_k^2, k=1,\dots,p\). The default is "HN(2)", a half-normal prior for \(\sigma_k\) with variance equal to two, \(\sigma_k \sim N(0,2) I[\sigma>0]\). Inverse Gamma priors for \(\sigma_k^2\) are also available, declared using "IG(a,b)". It can be of dimension \(1\) or \(p\) (the length of the multivariate response).

pi.sigmaPrior

The Beta prior of \(\pi_{\sigma}\). The default is "Beta(1,1)". It can be of dimension \(1\), of dimension \(L\) (the number of effects that enter the variance model), or of dimension \(pL\)

corr.Model

Specifies the model for the correlation matrices \(R_t\). The three choices supported are "common", that specifies a common correlations model, "groupC", that specifies a grouped correlations model, and "groupV", that specifies a grouped variables model. When the model chosen is either "groupC" or "groupV", the upper limit on the number of clusters can also be specified, using corr.Model = c("groupC", nClust = d) or corr.Model = c("groupV", nClust = p). If the number of clusters is left unspecified, for the "groupV" model, it is taken to be \(p\), the number of responses. For the "groupC" model, it is taken to be \(d = p(p-1)/2\), the number of free elements in the correlation matrices.

DP.concPrior

The Gamma prior for the Dirichlet process concentration parameter.

c.etaPrior

The inverse Gamma prior of \(c_{\eta}\). The default is "IG(0.5,0.5*samp)", that is, an inverse Gamma with parameters \(1/2\) and \(samp/2\), where \(samp\) is the number of correlations observed over time, that is $samp=M*d$ where $M$ is the number of unique observation time points and $d$ is the number of non-redundant elements of $R$.

pi.nuPrior

The Beta prior of \(\pi_{\nu}\). The default is "Beta(1,1)". It can be of dimension \(1\).

pi.fiPrior

The Beta prior of \(\pi_{\varphi}\). The default is "Beta(1,1)". It can be of dimension \(1\).

c.omegaPrior

The prior of \(c_{\omega}^2\). The default is "HN(2)", a half-normal prior for \(c_{\omega}\) with variance equal to two, \(c_{\omega} \sim N(0,2) I[c_{\omega} > 0]\). Inverse Gamma priors for \(c_{\omega}^2\) are also available, declared using "IG(a,b)". It can be of dimension \(1\).

sigmaCorPrior

The prior of \(\sigma^2\). The default is "HN(2)", a half-normal prior for \(\sigma^2\) with variance equal to two, \(\sigma \sim N(0,2) I[\sigma > 0]\). Inverse Gamma priors for \(\sigma^2\) are also available, declared using "IG(a,b)". It can be of dimension \(1\).

tuneCalpha

Starting value of the tuning parameter for sampling \(c_{\alpha k}, k=1,\dots,p\). Defaults at a vector of $p$ ones. It could be of dimension \(p\).

tuneSigma2

Starting value of the tuning parameter for sampling \(\sigma^2_k, k=1,\dots,p\). Defaults at a vector of $p$ ones. It could be of dimension \(p\).

tuneCbeta

Starting value of the tuning parameter for sampling \(c_{\beta}\). Defaults at 100.

tuneAlpha

Starting value of the tuning parameter for sampling regression coefficients of the variance models \(\alpha_k, k=1,\dots,p\). Defaults at a vector of 5s. It could be of dimension \(L p\)

tuneSigma2R

Starting value of the tuning parameter for sampling \(\sigma^2\). Defaults at 1.

tuneR

Starting value of the tuning parameter for sampling correlation matrices. Defaults at \(40*(p+2)^3\). Can be of dimension \(1\) or \(M\) is the number of unique observation time points.

tuneCpsi

Starting value of the tuning parameter for sampling variances \(c_{\psi}^2\). Defaults at 5. Can be of dimension \(1\) or \(p^2\)

.

tuneCbCor

Starting value of the tuning parameter for sampling \(c_{\eta}^2\). Defaults at 10.

tuneOmega

Starting value of the tuning parameter for sampling regression coefficients of the variance models \(\omega\). Defaults at 5.

tuneComega

Starting value of the tuning parameter for sampling \(c_{\omega}\). Defaults at 1.

tau

The \(tau\) of the shadow prior. Defaults at 0.01.

FT

Binary indicator. If set equal to 1, the Fisher's z transform of the correlations is modelled, otherwise if set equal to 0, the untransformed correlations are modelled.

...

Other options that will be ignored.

Author

Georgios Papageorgiou gpapageo@gmail.com

Details

Function lmrm returns samples from the posterior distributions of the parameters of a regression model with normally distributed multivariate longitudinal responses. To describe the model, let \(Y_{ij} = (Y_{ij1},\dots,Y_{ijp})^{\top}\) denote the vector of \(p\) responses observed on individual \(i, i=1,\dots,n,\) at time point \(t_{ij}, j=1,\dots,n_i\). The observational time points \(t_{ij}\) are allowed to be unequally spaced. Further, let \(u_{ij}\) denote the covariate vector that is observed along with \(Y_{ij}\) and that may include time, other time-dependent covariates and time-independent ones. In addition, let \(Y_{i}=(Y_{i1}^{\top},\dots,Y_{in_i}^{\top})^{\top}\) denote the \(i\)th response vector. With \(\mu_i=E(Y_{i})\) and \(\Sigma_i = \) cov\((Y_i)\), the model assumes multivariate normality, \(Y_{i} \sim N(\mu_i, \Sigma_i), i=1,2,\dots,n\). The means \(\mu_i\) and covariance matrices \(\Sigma_i\) are modelled semiparametrically in terms of covariates. For the means one can specify semiparametric models, $$\mu_{ijk} = \beta_{k0} + \sum_{l=1}^{K_1} u_{ijl} \beta_{kl} + \sum_{l=K_1+1}^{K} f_{\mu,k,l}(u_{ijl}).$$ This is the first of the 5 regression submodels.

To model the covariance matrix, first consider the modified Cholesky decomposition, \(L_i \Sigma_i L_i^{\top} = D_i,\) where \(L_i\) is a unit block lower triangular matrix and \(D_i\) is a block diagonal matrix, $$ \begin{array}{cc} L_i = \left[ \begin{array}{cccc} I & 0 & \dots & 0 \\ -\Phi_{i21} & I & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ -\Phi_{in_i1} & -\Phi_{in_i1} & \dots & I \\ \end{array} \right], & D_i = \left[ \begin{array}{cccc} D_{1} & 0 & \dots & 0 \\ 0 & D_{2} & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & D_{n_i} \\ \end{array} \right], \end{array} $$

For modelling \(D_{ij}, i=1,\dots,n, j=1,\dots,n_i\) in terms of covariates, first we separate the variances and the correlations \(D_{ij} = S_{ij}^{1/2} R_{ij} S_{ij}^{1/2}\). It is easy to model matrix \(S_{ij}\) in terms of covariates as the only requirement on its diagonal elements is that they are nonnegative, $$\log \sigma^2_{ijk} = \alpha_{k0} + \sum_{l=1}^{L_1} w_{ijl} \alpha_{kl} + \sum_{l=L_1+1}^{L} f_{\sigma,k,l}(w_{ijl})$$ This is the second of the 5 regression submodels.

For \(\phi_{ijklm}\), the \((l,m)\) element of \(\Phi_{ijk}, l,m=1,\dots,p\), one can specify semiparametric models $$ \phi_{ijklm} = \psi_{lm0} + \sum_{b=1}^{B_1} v_{ijkb} \psi_{lmb} + \sum_{b=B_1+1}^{B} f_{\phi,l,m,b}(v_{ijkb}) $$ This is the third of the 5 regression submodels.

The elements of the correlations matrices \(R_{ij}\) are modelled in terms of covariate time only, hence they are denoted by \(R_t\). Subject to the positive definiteness constraint, the elements of \(R_t\) are modelled using a normal distribution with location and scale parameters, \(\mu_{ct}\) and \(\sigma^2_{ct}\), modelled as $$ \mu_{ct} = \eta_0 + f_{\mu}(t), $$ $$ \log \sigma^2_{ct} = \omega_0 + f_{\sigma}(t), $$ and these are the last 2 of the 5 submodels.

References

Papageorgiou, G. (2020). Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data. arXiv:2012.09833.

Examples

Run this code
	# Fit a joint mean-covariance model on the simulated dataset simD2
	require(ggplot2)
	data(simD2)
	model <- Y1 | Y2 ~ time | sm(time) | sm(lag) | sm(time) | 1 
	# the above defines the responses and the regression models on the left and 
	# right of "~", respectively 
	# the first model, for the mean, is a linear function of time, this is sufficient as 
	# the 2 responses have constant mean.
	# the second model, for the variances, is a smooth function of time
	# the third model, for the dependence structure, is a smooth function of lag, 
	# that lmrm figures out and it does not need to be computed by the user
	# the fourth model, for location of the correlations, is a smooth function of time
	# the fifth model, for scale of the correlations, is just an intercept model
	if (FALSE) {
	m1 <- lmrm(formula = model, corr.Model = c("common", nClust = 1), data = simD2,
		       id = id, time = time, sweeps = 2500, burn = 500, thin = 2, 
		       StorageDir = getwd(), seed = 1)
	plot(m1)
	}

Run the code above in your browser using DataLab