tlgmm: Fit the binary Gaussian mixture model (GMM) on target data set by leveraging multiple source data sets under a transfer learning (TL) setting.

Description

Fit the binary Gaussian mixture model (GMM) on target data set by leveraging multiple source data sets under a transfer learning (TL) setting. This function implements the modified EM algorithm (Altorithm 4) proposed in Tian, Y., Weng, H., & Feng, Y. (2022).

Usage

tlgmm(
  x,
  fitted_bar,
  step_size = c("lipschitz", "fixed"),
  eta_w = 0.1,
  eta_mu = 0.1,
  eta_beta = 0.1,
  lambda_choice = c("fixed", "cv"),
  cv_nfolds = 5,
  cv_upper = 2,
  cv_lower = 0.01,
  cv_length = 5,
  C1_w = 0.05,
  C1_mu = 0.2,
  C1_beta = 0.2,
  C2_w = 0.05,
  C2_mu = 0.2,
  C2_beta = 0.2,
  kappa0 = 1/3,
  tol = 1e-05,
  initial_method = c("kmeans", "EM"),
  iter_max = 1000,
  iter_max_prox = 100,
  ncores = 1
)

Value

A list with the following components.

w: the estimate of mixture proportion in GMMs for the target task. Will be a vector.
mu1: the estimate of Gaussian mean in the first cluster of GMMs for the target task. Will be a matrix, where each column represents the estimate for a task.
mu2: the estimate of Gaussian mean in the second cluster of GMMs for the target task. Will be a matrix, where each column represents the estimate for a task.
beta: the estimate of the discriminant coefficient for the target task. Will be a matrix, where each column represents the estimate for a task.
Sigma: the estimate of the common covariance matrix for the target task. Will be a list, where each component represents the estimate for a task.
C1_w: the initial value of C1_w.
C1_mu: the initial value of C1_mu.
C1_beta: the initial value of C1_beta.
C2_w: the initial value of C2_w.
C2_mu: the initial value of C2_mu.
C2_beta: the initial value of C2_beta.

Arguments

x

design matrix of the target data set. Should be a matrix or data.frame object.

fitted_bar

the output from mtlgmm function.

step_size

step size choice in proximal gradient method to solve each optimization problem in the revised EM algorithm (Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)), which can be either "lipschitz" or "fixed". Default = "lipschitz".

lipschitz: eta_w, eta_mu and eta_beta will be chosen by the Lipschitz property of the gradient of objective function (without the penalty part). See Section 4.2 of Parikh, N., & Boyd, S. (2014).
fixed: eta_w, eta_mu and eta_beta need to be specified

eta_w

step size in the proximal gradient method to learn w (Step 3 of Algorithm 4 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when step_size = "fixed".

eta_mu

step size in the proximal gradient method to learn mu (Steps 4 and 5 of Algorithm 4 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when step_size = "fixed".

eta_beta

step size in the proximal gradient method to learn beta (Step 7 of Algorithm 4 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when step_size = "fixed".

lambda_choice

the choice of constants in the penalty parameter used in the optimization problems. See Algorithm 4 of Tian, Y., Weng, H., & Feng, Y. (2022), which can be either "fixed" or "cv". Default = "cv".

cv: cv_nfolds, cv_upper, and cv_length need to be specified. Then the C1 and C2 parameters will be chosen in all combinations in exp(seq(log(cv_lower/10), log(cv_upper/10), length.out = cv_length)) via cross-validation. Note that this is a two-dimensional cv process, because we set C1_w = C2_w, C1_mu = C1_beta = C2_mu = C2_beta to reduce the computational cost.
fixed: C1_w, C1_mu, C1_beta, C2_w, C2_mu, and C2_beta need to be specified. See equations (19)-(24) in Tian, Y., Weng, H., & Feng, Y. (2022).

cv_nfolds

the number of cross-validation folds. Default: 5

cv_upper

the upper bound of lambda values used in cross-validation. Default: 5

cv_lower

the lower bound of lambda values used in cross-validation. Default: 0.01

cv_length

the number of lambda values considered in cross-validation. Default: 5

C1_w

the initial value of C1_w. See equations (19) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.05

C1_mu

the initial value of C1_mu. See equations (20) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2

C1_beta

the initial value of C1_beta. See equations (21) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2

C2_w

the initial value of C2_w. See equations (22) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.05

C2_mu

the initial value of C2_mu. See equations (23) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2

C2_beta

the initial value of C2_beta. See equations (24) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2

kappa0

the decaying rate used in equation (19)-(24) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 1/3

tol

maximum tolerance in all optimization problems. If the difference between last update and the current update is less than this value, the iterations of optimization will stop. Default: 1e-05

initial_method

initialization method. This indicates the method to initialize the estimates of GMM parameters for each data set. Can be either "kmeans" or "EM".

kmeans: the initial estimates of GMM parameters will be generated from the single-task k-means algorithm. Will call kmeans function in stats package.
EM: the initial estimates of GMM parameters will be generated from the single-task EM algorithm. Will call Mclust function in mclust package.

iter_max

the maximum iteration number of the revised EM algorithm (i.e. the parameter T in Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 1000

iter_max_prox

the maximum iteration number of the proximal gradient method. Default: 100

ncores

the number of cores to use. Parallel computing is strongly suggested, specially when lambda_choice = "cv". Default: 1

References

Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.

Parikh, N., & Boyd, S. (2014). Proximal algorithms. Foundations and trends in Optimization, 1(3), 127-239.

Examples

Run this code

set.seed(0, kind = "L'Ecuyer-CMRG")
## Consider a transfer learning problem with 3 source tasks and 1 target task in the setting "MTL-1"
data_list_source <- data_generation(K = 3, outlier_K = 0, simulation_no = "MTL-1", h_w = 0,
h_mu = 0, n = 50)  # generate the source data
data_target <- data_generation(K = 1, outlier_K = 0, simulation_no = "MTL-1", h_w = 0.1,
h_mu = 1, n = 50)  # generate the target data
fit_mtl <- mtlgmm(x = data_list_source$data$x, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2,
C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM",
trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz")

fit_tl <- tlgmm(x = data_target$data$x[[1]], fitted_bar = fit_mtl, C1_w = 0.05,
C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa0 = 1/3,
initial_method = "EM", ncores = 1, lambda_choice = "fixed", step_size = "lipschitz")

# \donttest{
# use cross-validation to choose the tuning parameters
# warning: can be quite slow, large "ncores" input is suggested!!
fit_tl <- tlgmm(x = data_target$data$x[[1]], fitted_bar = fit_mtl, kappa0 = 1/3,
initial_method = "EM", ncores = 2, lambda_choice = "cv", step_size = "lipschitz")
# }