generate_Gaussian: Generate a Gaussian distributed data set

Description

This function will generate a Gaussian distributed data set with latent variables and correlated replicates.

Usage

generate_Gaussian(n, R, p, l, s, sparsityA, sparsityobserved, sparsitylatent, lwb, 
upb, seed)

Arguments

the number of observations.

the number of replicates.

the number of observed variables.

the number of latent variables.

latent effects are generated as $s$ piecewise constant across replicates. The number $s$ should be a factor of $R$.

sparsityA

proportion of the number of zeros in the transition matrix $A$. Must be a number from 0 to 1.

sparsityobserved

proportion of the number of zeros in the inverse covariance of the observed variables. Must be a number from 0 to 1.

sparsitylatent

proportion of the number of zeros in the inverse covariances among latent variables and between observed variables and latent variables. Must be a number from 0 to 1.

lwb

lower bound for the elements in the inverse covariance matrix.

upb

upper bound for the elements in the inverse covariance matrix.

seed

the seed for the random number generator.

Value

the generated data, which is a list with $n$ elements and each element is a matrix with $R$ rows and $p$ columns

truegraph

a matrix that encodes the conditional dependence relationships between sets of two observed variables

Details

This function aims to generate a Gaussian distributed data set with latent variables and correlated replicates. For each observation, the latent variables are piecewise constant across replicates, and conditioned on the latent variables, the replicates follow a one-lag vector autoregressive model, where the transition matrix $A$ is sparse with non-zero elements set equal to 0.3. The matrix $\Sigma$ is the covariance matrix for the observed variables X and the latent variables $U$, and we partition $\Sigma$ into matrices that quantify the relationships among the observed variables ($\Sigma_{XX}$), between the observed variables and latent variables ($\Sigma_{XU}$ or $\Sigma_{UX}$), and of the latent variables ($\Sigma_{UU}$). In general, the data is generated with: $$ X_{i1} | U_{i1} \sim N_p(\Sigma_{XU}\Sigma^{-1}_{UU} U_{i1}, \Sigma_{XX} - \Sigma_{XU}\Sigma^{-1}_{UU}\Sigma_{UX}), $$

$$ X_{it} | X_{i(t-1)},U_{it} \sim N_p(AX_{i(t-1)} + \Sigma_{XU}\Sigma^{-1}_{UU} U_{it}, \Sigma_{XX} - \Sigma_{XU}\Sigma^{-1}_{UU}\Sigma_{UX}), $$ where $1 \le i \le n$ and $1 \le t \le R$.

References

Jin, Y., Ning, Y., and Tan, K. M. (2020), `Exponential Family Graphical Models with Correlated Replicates and Unmeasured Confounders', preprint available.

Examples

Run this code

# NOT RUN {
data <- generate_Gaussian(n = 50, R = 20, p = 30, l = 2, s = 2, sparsityA = 0.95,
sparsityobserved = 0.9, sparsitylatent = 0.2, lwb = 0.3, upb = 0.3, seed = 1)
# }

Run the code above in your browser using DataLab