simData: Synthetic data generator

Description

Simulate data from a multivariate normal mixture using a mixture of factor analyzers mechanism.

Usage

simData(sameSigma, sameLambda, p, q, K.true, n, loading_means, loading_sd, sINV_values)

Value

A list with the following entries:

data: \(n\times p\) array containing the simulated data.
class: \(n\)-dimensional vector containing the class of each observation.
factorLoadings: \(K.true\times p \times q\)-array containing the factor loadings \(\Lambda_{krj}\) per cluster \(k\), feature \(r\) and factor \(j\), where \(k=1,\ldots,K\); \(r=1,\ldots,p\); \(j=1,\ldots,q\).
means: \(K.true\times p\) matrix containing the marginal means \(\mu_{kr}\), \(k=1,\ldots,K\); \(r=1,\ldots,p\).
variance: \(p\times p\) diagonal matrix containing the variance of errors \(\sigma_{rr}\), \(r=1,\ldots,p\). Note that the same variance of errors is assumed for each cluster.
factors: \(n\times q\) matrix containing the simulated factor values.
weights: \(K.true\)-dimensional vector containing the weight of each cluster.

Arguments

sameSigma

Logical.

sameLambda

Logical.

p

The dimension of the multivariate normal distribution (\(p > 1\)).

q

Number of factors. It should be strictly smaller than p.

K.true

The number of mixture components (clusters).

n

Sample size.

loading_means

A vector which contains the means of blocks of factor loadings.

Default: loading_means = c(-30,-20,-10,10, 20, 30).

loading_sd

A vector which contains the standard deviations of blocks of factor loadings.

Default: loading_sd <- rep(2, length(loading_means)).

sINV_values

A vector which contains the values of the diagonal of the (common) inverse covariance matrix, if sigmaTrue = TRUE. An \(K\times p\) matrix which contains the values of the diagonal of the inverse covariance matrix per component, if sigmaTrue = FALSE.

Default: sINV_values = rgamma(p, shape = 1, rate = 1).

Author

Panagiotis Papastamoulis

Examples

Run this code

library('fabMix')

n = 8                # sample size
p = 5                # number of variables
q = 2                # number of factors
K = 2                # true number of clusters

sINV_diag = 1/((1:p))    # diagonal of inverse variance of errors
set.seed(100)
syntheticDataset <- simData(sameLambda=TRUE,K.true = K, n = n, q = q, p = p, 
                        sINV_values = sINV_diag)
summary(syntheticDataset)

Run the code above in your browser using DataLab