Generate sample clustered binary data with cluster labels. The probability of
a '1' in each cluster for each variable is randomly generated via a Beta(1,
5) distribution, encouraging sparse probabilities which vary across clusters.
For noisy variables, the probability of a '1' is also generated by a Beta(1,
5) distribution but this probability is the same regardless of the cluster
membership of the observation.
If yout = TRUE, this will be a vector with the outcome
variable.
Arguments
n
Number of observations in dataset.
K
Number of clusters desired.
w
A vector of mixture weights (proportion of population in each
cluster).
p
Number of clustering variables/covariates in dataset.
Irrp
Number of irrelevant/noisy variables/covariates in dataset. Note
that these variables will be the final Irrp columns in the simulated
dataset. Total data dimension is p + Irrp.
yout
Default FALSE. Indicate whether a binary outcome associated with
clustering is required.