A simulated dataset for integrated clustering with binary outcome. The data is simulated under cluster number K = 2.
sim2
A matrix of 22 columns, which are
Genetic features, G1 to G5 are causal genes contributed to clustering, with OR = 2; G6 to G10 are null genes that is not related to clustering
Biomarkers, Z1 to Z5 are causal biomarkers with delta Z = 4 between 2 clusters, Z6 to Z10 are noises with delta Z = 0. All biomarkers are assumed to be independent with each other
Outcome of interest, the odds ratio of the cluster is 2
Latent cluster assignment for each observation