Last chance! 50% off unlimited learning
Sale ends in
Computes M-fold or Leave-One-Out Cross-Validation scores based on a user-input
grid to determine the optimal parsity parameters values for method block.splsda
.
tune.block.splsda(X, Y,
indY,
ncomp = 2,
test.keepX,
already.tested.X,
validation = "Mfold",
folds = 10,
dist = "max.dist",
measure = "BER",
weighted = TRUE,
progressBar = TRUE,
max.iter = 100,
near.zero.var = FALSE,
nrepeat = 1,
design,
scheme,
mode,
scale = TRUE,
bias,
init ,
tol = 1e-06,
verbose,
light.output = TRUE,
cpus,
name.save=NULL
)
numeric matrix of predictors. NA
s are allowed.
if(method = 'spls')
numeric vector or matrix of continuous responses (for multi-response models) NA
s are allowed.
To be supplied if Y is missing, indicates the position of the matrix / vector response in the list X
the number of components to include in the model.
A list of length the number of blocks in X (without the outcome). Each entry of this list is a numeric vector for the different keepX values to test for that specific block.
Optional, if ncomp > 1
A numeric vector indicating the number of variables to select from the
character. What kind of (internal) validation to use, matching one of "Mfold"
or
"loo"
(see below). Default is "Mfold"
.
the folds in the Mfold cross-validation. See Details.
distance metric to use for splsda
to estimate the classification error rate,
should be a subset of "centroids.dist"
, "mahalanobis.dist"
or "max.dist"
(see Details).
Two misclassification measure are available: overall misclassification error overall
or the Balanced Error Rate BER
tune using either the performance of the Majority vote or the Weighted vote.
by default set to TRUE
to output the progress bar of the computation.
integer, the maximum number of iterations.
boolean, see the internal nearZeroVar
function (should be set to TRUE in particular for data with many zero values). Default value is FALSE
Number of times the Cross-Validation process is repeated.
numeric matrix of size (number of blocks in X) x (number of blocks in X) with 0 or 1 values. A value of 1 (0) indicates a relationship (no relationship) between the blocks to be modelled. If Y
is provided instead of indY
, the design
matrix is changed to include relationships to Y
.
Either "horst", "factorial" or "centroid". Default = centroid
, see reference.
character string. What type of algorithm to use, (partially) matching
one of "regression"
, "canonical"
, "invariant"
or "classic"
.
See Details. Default = regression
.
boleean. If scale = TRUE, each block is standardized
to zero means and unit variances. Default = TRUE
.
boleean. A logical value for biaised or unbiaised estimator of the var/cov. Default = FALSE
.
Mode of initialization use in the algorithm, either by Singular Value Decompostion of the product of each block of X with Y ("svd") or each block independently ("svd.single"). Default = svd
.
Convergence stopping value.
if set to TRUE
, reports progress on computing.
if set to FALSE, the prediction/classification of each sample for each of test.keepX
and each comp is returned.
Number of cpus to use when running the code in parallel.
character string for the name of the file to be saved.
A list that contains:
returns the prediction error for each test.keepX
on each component, averaged across all repeats and subsampling folds. Standard deviation is also output. All error rates are also available as a list.
returns the number of variables selected (optimal keepX) on each component, for each block.
returns the optimal number of components for the model fitted with $choice.keepX
.
returns the error rate for each level of Y
and for each component computed with the optimal keepX
Prediction values for each sample, each test.keepX
, each comp and each repeat. Only if light.output=FALSE
Predicted class for each sample, each test.keepX
, each comp and each repeat. Only if light.output=FALSE
compute the correlation between latent variables for two-factor sPLS-DA analysis.
This tuning function should be used to tune the keepX parameters in the block.splsda
function (N-integration with sparse Discriminant Analysis).
M-fold or LOO cross-validation is performed with stratified subsampling where all classes are represented in each fold.
If validation = "Mfold"
, M-fold cross-validation is performed.
The number of folds to generate is to be specified in the argument folds
.
If validation = "loo"
, leave-one-out cross-validation is performed. By default folds
is set to the number of unique individuals.
All combination of test.keepX values are tested. A message informs how many will be fitted on each component for a given test.keepX.
More details about the prediction distances in ?predict
. More details about the PLS modes in ?pls
.
Method:
Singh A., Gautier B., Shannon C., Vacher M., Rohart F., Tebbutt S. and Le Cao K.A. (2016). DIABLO: multi omics integration for biomarker discovery.
mixOmics manuscript:
Rohart F, Gautier B, Singh A, Le Cao K-A. mixOmics: an R package for 'omics feature selection and multiple data integration.
block.splsda
and http://www.mixOmics.org for more details.
# NOT RUN {
# }
# NOT RUN {
data("breast.TCGA")
# this is the X data as a list of mRNA and miRNA; the Y data set is a single data set of proteins
data = list(mrna = breast.TCGA$data.train$mrna, mirna = breast.TCGA$data.train$mirna,
protein = breast.TCGA$data.train$protein)
# set up a full design where every block is connected
# could also consider other weights, see our mixOmics manuscript
design = matrix(1, ncol = length(data), nrow = length(data),
dimnames = list(names(data), names(data)))
diag(design) = 0
design
# set number of component per data set
ncomp = 5
# Tuning the first two components
# -------------
# definition of the keepX value to be tested for each block mRNA miRNA and protein
# names of test.keepX must match the names of 'data'
test.keepX = list(mrna = seq(10,40,20), mirna = seq(10,30,10), protein = seq(1,10,5))
# the following may take some time to run, note that for through tuning
# nrepeat should be > 1
tune = tune.block.splsda(X = data, Y = breast.TCGA$data.train$subtype,
ncomp = ncomp, test.keepX = test.keepX, design = design, nrepeat = 3)
tune$choice.ncomp
tune$choice.keepX
# Only tuning the second component
# -------------
already.mrna = 4 # 4 variables selected on comp1 for mrna
already.mirna = 2 # 2 variables selected on comp1 for mirna
already.prot = 1 # 1 variables selected on comp1 for protein
already.tested.X = list(mrna = already.mrna, mirna = already.mirna, prot = already.prot)
tune = tune.block.splsda(X = data, Y = breast.TCGA$data.train$subtype,
ncomp = 2, test.keepX = test.keepX, design = design,
already.tested.X = already.tested.X)
tune$choice.keepX
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab