sgl_subsampling: Generic sparse group lasso subsampling procedure

Description

Subsampling procedure with support parallel computations.

Usage

sgl_subsampling(module_name, PACKAGE, data, parameterGrouping = NULL,
  groupWeights = NULL, parameterWeights = NULL, alpha, lambda,
  d = 100, compute_lambda = length(lambda) == 1, training = NULL,
  test = NULL, responses = NULL, auto_response_names = TRUE,
  collapse = FALSE, max.threads = NULL, use_parallel = FALSE,
  algorithm.config = sgl.standard.config)

Arguments

module_name

reference to objective specific C++ routines.

PACKAGE

name of the calling package.

data

a list of data objects -- will be parsed to the specified module.

parameterGrouping

grouping of parameters, a vector of length $p$. Each element of the vector specifying the group of the parameters in the corresponding column of $\beta$.

groupWeights

the group weights, a vector of length length(unique(parameterGrouping)) (the number of groups).

parameterWeights

a matrix of size $q \times p$.

alpha

the $\alpha$ value 0 for group lasso, 1 for lasso, between 0 and 1 gives a sparse group lasso penalty.

lambda

lambda.min relative to lambda.max (if compute_lambda = TRUE) or the lambda sequence for the regularization path, a vector or a list of vectors (of the same length) with the lambda sequence for the subsamples.

length of lambda sequence (ignored if compute_lambda = FALSE)

compute_lambda

should the lambda sequence be computed

training

a list of training samples, each item of the list corresponding to a subsample. Each item in the list must be a vector with the indices of the training samples for the corresponding subsample. The length of the list must equal the length of the test list.

test

a list of test samples, each item of the list corresponding to a subsample. Each item in the list must be vector with the indices of the test samples for the corresponding subsample. The length of the list must equal the length of the training list.

responses

a vector of responses to simplify and return (if NULL (deafult) no formating will be done)

auto_response_names

set response names

collapse

if TRUE the results will be collapsed and ordered into one result, resembling the output of sgl_predict (this is only valid if the test samples are not overlapping)

max.threads

Deprecated (will be removed in 2018), instead use use_parallel = TRUE and registre parallel backend (see package 'doParallel'). The maximal number of threads to be used.

use_parallel

If TRUE the foreach loop will use %dopar%. The user must registre the parallel backend.

algorithm.config

the algorithm configuration to be used.

Value

Y.true

the response, that is the y object in data as created by create.sgldata.

responses

content will depend on the C++ response class

features

number of features used in the models

parameters

number of parameters used in the models

lambda

the lambda sequences used (a vector or list of length length(training)).

Details

If no formating is done (i.e. if responses = NULL) then the responses field contains a list of lists structured in the following way:

subsamples 1:

sample test[[1]][1]
- model (lambda) index 1
  - response elements
- model (lambda) index 2
  - response elements
- ...
sample test[[1]][2]
- model (lambda) index 1
  - response elements
- model (lambda) index 2
  - response elements
- ...
...

subsamples 2: ...

If responses = "rname" with rname the name of the response then a list at responses$rname will be returned. The content of the list will depend on the type of the response.

vector A list with format subsamples -> models -> matrix of dimension $n_i \times q$ containing the responses for the corresponding model and subsample (where $q$ is the dimension of the response).
matrix A list with format subsamples -> samples -> models - > the response matrix.