data_generation: Generate data for simulations.

Description

Generate data for simulations. All models used in Tian, Y., Weng, H., & Feng, Y. (2022)) are implemented.

Usage

data_generation(
  K = 10,
  outlier_K = 1,
  simulation_no = c("MTL-1", "MTL-2"),
  h_w = 0.1,
  h_mu = 1,
  n = 50
)

Value

a list of two sub-lists "data" and "parameter". List "data" contains a list of design matrices x, a list of hidden labels y, and a vector of outlier task indices outlier_index. List "parameter" contains a vector w of mixture proportions, a matrix mu1 of which each column is the GMM mean of the first cluster of each task, a matrix mu2 of which each column is the GMM mean of the second cluster of each task, a matrix beta of which each column is the discriminant coefficient in each task, a list Sigma of covariance matrices for each task.

Arguments

K: the number of tasks (data sets). Default: 10
outlier_K: the number of outlier tasks. Default: 1
simulation_no: simulation number in Tian, Y., Weng, H., & Feng, Y. (2022)). Can be "MTL-1", "MTL-2". Default = "MTL-1".
h_w: the value of h_w. Default: 0.1
h_mu: the value of h_mu. Default: 1
n: the sample size of each task. Can be either an positive integer or a vector of length K. If it is an integer, then the sample size of all tasks will be the same and equal to n. If it is a vector, then the k-th number will be the sample size of the k-th task. Default: 50.

References

Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.

Examples

Run this code

data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1,
h_mu = 1, n = 50)