Subsampling procedure with support parallel computations.
sgl_subsampling(module_name, PACKAGE, data, parameterGrouping = NULL,
groupWeights = NULL, parameterWeights = NULL, alpha, lambda,
d = 100, compute_lambda = length(lambda) == 1, training = NULL,
test = NULL, responses = NULL, auto_response_names = TRUE,
collapse = FALSE, max.threads = NULL, use_parallel = FALSE,
algorithm.config = sgl.standard.config)
reference to objective specific C++ routines.
name of the calling package.
a list of data objects -- will be parsed to the specified module.
grouping of parameters, a vector of length \(p\). Each element of the vector specifying the group of the parameters in the corresponding column of \(\beta\).
the group weights, a vector of length length(unique(parameterGrouping))
(the number of groups).
a matrix of size \(q \times p\).
the \(\alpha\) value 0 for group lasso, 1 for lasso, between 0 and 1 gives a sparse group lasso penalty.
lambda.min relative to lambda.max (if compute_lambda = TRUE
) or the lambda sequence for the regularization path, a vector or a list of vectors (of the same length) with the lambda sequence for the subsamples.
length of lambda sequence (ignored if compute_lambda = FALSE
)
should the lambda sequence be computed
a list of training samples, each item of the list corresponding to a subsample.
Each item in the list must be a vector with the indices of the training samples for the corresponding subsample.
The length of the list must equal the length of the test
list.
a list of test samples, each item of the list corresponding to a subsample.
Each item in the list must be vector with the indices of the test samples for the corresponding subsample.
The length of the list must equal the length of the training
list.
a vector of responses to simplify and return (if NULL (deafult) no formating will be done)
set response names
if TRUE
the results will be collapsed and ordered into one result, resembling the output of sgl_predict
(this is only valid if the test samples are not overlapping)
Deprecated (will be removed in 2018),
instead use use_parallel = TRUE
and registre parallel backend (see package 'doParallel').
The maximal number of threads to be used.
If TRUE
the foreach
loop will use %dopar%
. The user must registre the parallel backend.
the algorithm configuration to be used.
the response, that is the y
object in data as created by create.sgldata
.
content will depend on the C++ response class
number of features used in the models
number of parameters used in the models
the lambda sequences used (a vector or list of length length(training)
).
If no formating is done (i.e. if responses = NULL
)
then the responses
field contains a list of lists structured in the following way:
subsamples 1:
sample test[[1]][1]
model (lambda) index 1
response elements
model (lambda) index 2
response elements
...
sample test[[1]][2]
model (lambda) index 1
response elements
model (lambda) index 2
response elements
...
...
subsamples 2: ...
If responses = "rname"
with rname
the name of the response then a list at responses$rname
will be returned.
The content of the list will depend on the type of the response.
vector A list with format subsamples -> models -> matrix of dimension \(n_i \times q\) containing the responses for the corresponding model and subsample (where \(q\) is the dimension of the response).
matrix A list with format subsamples -> samples -> models - > the response matrix.