- N, G, P
Desired overall number of observations, number of clusters, and number of variables in the simulated data set. All must be a single integer.
- Q
Desired number of cluster-specific latent factors in the simulated data set. Can be specified either as a single integer if all clusters are to have the same number of factors, or a vector of length G
. Defaults to floor(log(P))
in each cluster. Should be less than the associated Ledermann
bound and the number of observations in the corresponding cluster. The argument forceQg
can be used to enforce this upper limit. It is also advisable that Q <= floor((P - 1)/2)
, but this restriction is not enforced by forceQg
.
- pis
Mixing proportions of the clusters in the data set if G
> 1. Must sum to 1. Defaults to rep(1/G, G)
.
- mu
True values of the mean parameters, either as a single value, a vector of length G
, a vector of length P
, or a G * P
matrix. If mu
is missing, loc.diff
is invoked to simulate distinct means for each cluster by default.
- psi
True values of uniqueness parameters, either as a single value, a vector of length G
, a vector of length P
, or a G * P
matrix. As such the user can specify uniquenesses as a diagonal or isotropic matrix, and further constrain uniquenesses across clusters if desired. If psi
is missing, uniquenesses are simulated via 1/rgamma(P, 2, 1)
within each cluster by default.
- loadings
True values of the loadings matrix/matrices. Must be supplied in the form of a list of numeric matrices when G > 1
, otherwise a single matrix. Matrices must contain P
rows and the number of columns must correspond to the values in Q
. If loadings
are not supplied, such matrices are populated with standard normal random variates by default (see non.zero
).
- scores
True values of the latent factor scores, as a N * max(Q)
numeric matrix. If scores
are not supplied, such a matrix is populated with standard normal random variates by default. Only relevant when method="conditional"
.
- nn
An alternative way to specify the size of each cluster, by giving the exact number of observations in each cluster explicitly. Must sum to N
.
- loc.diff
A parameter to control the closeness of the clusters in terms of the difference in their location vectors. Only relevant if mu
is NOT supplied. Defaults to 2
.
More specifically, loc.diff
(if invoked) is invoked as follows: means are simulated with the vector of cluster-specific hypermeans given by:
scale(1:G, center=TRUE, scale=FALSE) * loc.diff
.
- non.zero
Controls the number of non-zero entries in each loadings column (per cluster) only when loadings
is not explicitly supplied. Values must be integers in the interval [1,P]
. Defaults to P
. The positions of the zeros are randomised, and non-zero entries are drawn from a standard normal.
Must be given as a list of length G
of vectors of length corresponding to Q
when G>1
. Can be given either as such a list or simply a vector of length Q
when G=1
. Alternatively, a single integer can be supplied, common across all loadings columns across all clusters. In any case, non.zero
will be affected by forceQg=TRUE
by default (see below).
- forceQg
A logical indicating whether the upper limit on the number of cluster-specific factors Q
is enforced. Defaults to TRUE
for sim_IMIFA_data
, but is always FALSE
for sim_IMIFA_model
. Note that when forceQg=TRUE
is invoked, non.zero
(see above) is also affected. This upper limit is determined by the Ledermann
bound and that Q
must be less than the number of observations in the given cluster. It is also advisable that Q <= floor((P - 1)/2)
, but this restriction is not enforced by forceQg
.
- method
A switch indicating whether the mixture to be simulated from is the conditional distribution of the data given the latent variables (default), or simply the marginal distribution of the data.
- res
An object of class "Results_IMIFA"
generated by get_IMIFA_results
.