lasso_variants: Lasso, (fitted) group lasso, and (fitted) sparse-group lasso

Description

Fit a mixed model with lasso, group lasso, or sparse-group lasso via proximal gradient descent. As this is an iterative algorithm, the step size for each iteration is determined via backtracking line search. A grid search for the regularization parameter $\lambda$ is performed using warm starts. The mixed model has the form: $$y = X b + Z u + residual.$$ The penalty of the sparse-group lasso (without additional weights for features) is then: $$\alpha \lambda ||u||_1 + (1 - \alpha) \lambda \sum_l \omega^G_l ||u^{(l)}||_2.$$ If $\alpha = 1$, this leads to the lasso. If $\alpha = 0$, this leads to the group lasso. Furthermore, if instead of applying the $l_2$-norm on $u^{(l)}$ but on the fitted values $Z^{(l)} u^{(l)}$ two more algorithms may be called: either the fitted group lasso or the fitted sparse-group lasso.

Usage

seagull_fitted_group_lasso(
  VECTOR_Yc,
  Y_MEAN,
  MATRIX_Xc,
  VECTOR_Xc_MEANS,
  VECTOR_Xc_STANDARD_DEVIATIONS,
  VECTOR_WEIGHTS_FEATURESc,
  VECTOR_WEIGHTS_GROUPSc,
  VECTOR_FULL_COLUMN_RANK,
  VECTOR_GROUPS,
  VECTOR_BETAc,
  VECTOR_INDEX_PERMUTATION,
  VECTOR_INDEX_EXCLUDE,
  EPSILON_CONVERGENCE,
  ITERATION_MAX,
  GAMMA,
  LAMBDA_MAX,
  PROPORTION_XI,
  DELTA,
  NUMBER_INTERVALS,
  NUMBER_FIXED_EFFECTS,
  NUMBER_VARIABLES,
  INTERNAL_STANDARDIZATION,
  TRACE_PROGRESS
)
seagull_fitted_sparse_group_lasso(
  VECTOR_Yc,
  Y_MEAN,
  MATRIX_Xc,
  VECTOR_Xc_MEANS,
  VECTOR_Xc_STANDARD_DEVIATIONS,
  VECTOR_WEIGHTS_FEATURESc,
  VECTOR_WEIGHTS_GROUPSc,
  VECTOR_FULL_COLUMN_RANK,
  VECTOR_GROUPS,
  VECTOR_BETAc,
  VECTOR_INDEX_PERMUTATION,
  VECTOR_INDEX_EXCLUDE,
  ALPHA,
  EPSILON_CONVERGENCE,
  ITERATION_MAX,
  LAMBDA_MAX,
  PROPORTION_XI,
  DELTA,
  STEP_SIZE,
  NUMBER_INTERVALS,
  NUMBER_FIXED_EFFECTS,
  NUMBER_VARIABLES,
  INTERNAL_STANDARDIZATION,
  TRACE_PROGRESS
)
seagull_group_lasso(
  VECTOR_Yc,
  Y_MEAN,
  MATRIX_Xc,
  VECTOR_Xc_MEANS,
  VECTOR_Xc_STANDARD_DEVIATIONS,
  VECTOR_WEIGHTS_FEATURESc,
  VECTOR_GROUPS,
  VECTOR_BETAc,
  VECTOR_INDEX_PERMUTATION,
  VECTOR_INDEX_EXCLUDE,
  EPSILON_CONVERGENCE,
  ITERATION_MAX,
  GAMMA,
  LAMBDA_MAX,
  PROPORTION_XI,
  NUMBER_INTERVALS,
  NUMBER_FIXED_EFFECTS,
  NUMBER_VARIABLES,
  INTERNAL_STANDARDIZATION,
  TRACE_PROGRESS
)
seagull_lasso(
  VECTOR_Yc,
  Y_MEAN,
  MATRIX_Xc,
  VECTOR_Xc_MEANS,
  VECTOR_Xc_STANDARD_DEVIATIONS,
  VECTOR_WEIGHTS_FEATURESc,
  VECTOR_BETAc,
  VECTOR_INDEX_EXCLUDE,
  EPSILON_CONVERGENCE,
  ITERATION_MAX,
  GAMMA,
  LAMBDA_MAX,
  PROPORTION_XI,
  NUMBER_INTERVALS,
  NUMBER_FIXED_EFFECTS,
  NUMBER_VARIABLES,
  INTERNAL_STANDARDIZATION,
  TRACE_PROGRESS
)
seagull_sparse_group_lasso(
  VECTOR_Yc,
  Y_MEAN,
  MATRIX_Xc,
  VECTOR_Xc_MEANS,
  VECTOR_Xc_STANDARD_DEVIATIONS,
  VECTOR_WEIGHTS_FEATURESc,
  VECTOR_GROUPS,
  VECTOR_BETAc,
  VECTOR_INDEX_PERMUTATION,
  VECTOR_INDEX_EXCLUDE,
  ALPHA,
  EPSILON_CONVERGENCE,
  ITERATION_MAX,
  GAMMA,
  LAMBDA_MAX,
  PROPORTION_XI,
  NUMBER_INTERVALS,
  NUMBER_FIXED_EFFECTS,
  NUMBER_VARIABLES,
  INTERNAL_STANDARDIZATION,
  TRACE_PROGRESS
)

Arguments

VECTOR_Yc

numeric vector of observations.

Y_MEAN

arithmetic mean of VECTOR_Yc.

MATRIX_Xc

numeric design matrix relating y to fixed and random effects $[X Z]$. The columns may be permuted corresponding to their group assignments.

VECTOR_Xc_MEANS

numeric vector of arithmetic means of each column of MATRIX_Xc.

VECTOR_Xc_STANDARD_DEVIATIONS

numeric vector of estimates of standard deviations of each column of MATRIX_Xc. Values are calculated via the function colSds from the R-package matrixStats.

VECTOR_WEIGHTS_FEATURESc

numeric vector of weights for the vectors of fixed and random effects $[b^T, u^T]^T$. The entries may be permuted corresponding to their group assignments.

VECTOR_WEIGHTS_GROUPSc

numeric vector of pre-calculated weights for each group.

VECTOR_FULL_COLUMN_RANK

Boolean vector, which harbors the information of whether or not the group-wise parts of the filtered matrix Z, i.e., $Z^{(l)}$ for each group l, have full column rank.

VECTOR_GROUPS

integer vector specifying which effect (fixed and random) belongs to which group.

VECTOR_BETAc

numeric vector whose partitions will be returned (partition 1: estimates of fixed effects, partition 2: predictions of random effects). During the computation the entries may be in permuted order. But they will be returned according to the order of the user's input.

VECTOR_INDEX_PERMUTATION

integer vector that contains information about the original order of the user's input.

VECTOR_INDEX_EXCLUDE

integer vector, which contains the indices of every column that was filtered due to low standard deviation. This vector only has an effect, if standardize = TRUE is used.

EPSILON_CONVERGENCE

value for relative accuracy of the solution to stop the algorithm for the current value of $\lambda$. The algorithm stops after iteration m, if: $$||sol^{(m)} - sol^{(m-1)}||_\infty < \epsilon_c * ||sol1{(m-1)}||_2.$$

ITERATION_MAX

maximum number of iterations for each value of the penalty parameter $\lambda$. Determines the end of the calculation if the algorithm didn't converge according to EPSILON_CONVERGENCE before.

GAMMA

multiplicative parameter to decrease the step size during backtracking line search. Has to satisfy: $0 < \gamma < 1$.

LAMBDA_MAX

maximum value for the penalty parameter. This is the start value for the grid search of the penalty parameter $\lambda$.

PROPORTION_XI

multiplicative parameter to determine the minimum value of $\lambda$ for the grid search, i.e. $\lambda_{min} = \xi * \lambda_{max}$. Has to satisfy: $0 < \xi \le 1$. If xi=1, only a single solution for $\lambda = \lambda_{max}$ is calculated.

DELTA

numeric value, which is squared and added to the main diagonal of $Z^{(l)T} Z^{(l)}$ for group l, if this matrix is not invertible.

NUMBER_INTERVALS

number of lambdas for the grid search between $\lambda_{max}$ and $\xi * \lambda_{max}$. Loops are performed on a logarithmic grid.

NUMBER_FIXED_EFFECTS

non-negative integer to determine the number of fixed effects present in the mixed model.

NUMBER_VARIABLES

non-negative integer which corresponds to the sum of all columns of the initial model matrices X and Z.

INTERNAL_STANDARDIZATION

if TRUE, the input vector y is centered, and each column of the input matrices X and Z is centered and scaled with an internal process. Additionally, a filter is applied to X and Z, which filters columns with standard deviation less than 1.e-7.

TRACE_PROGRESS

if TRUE, a message will occur on the screen after each finished loop of the $\lambda$ grid. This is particularly useful for larger data sets.

ALPHA

mixing parameter of the penalty terms. Satisfies: $0 < \alpha < 1$. The penalty term looks as follows: $$\alpha * "lasso penalty" + (1-\alpha) * "group lasso penalty".$$

STEP_SIZE

numeric value which represents the size of the step between consecutive iterations.

Value

A list of estimates and parameters relevant for the computation:

intercept: estimate for the intercept, if present in the model.
fixed_effects: estimates for the fixed effects b, if present in the model. Each row corresponds to a particular value of $\lambda$.
random_effects: predictions for the random effects u. Each row corresponds to a particular value of $\lambda$.
lambda: all values for $\lambda$ which were used during the grid search.
iterations: a sequence of actual iterations for each value of $\lambda$. If an occurring number is equal to max_iter, then the algorithm most likely did not converge to rel_acc during the corresponding run of the grid search.

The following parameters are also returned. But primarily for the purpose of comparison and repetition: alpha = ALPHA (only for the sparse-group lasso), max_iter = ITERATION_MAX, gamma_bls = GAMMA, xi = PROPORTION_XI, and loops_lambda = NUMBER_INTERVALS.