Learn R Programming

OmicsMarkeR (version 1.4.2)

fs.ensembl.stability: Ensemble Classification & Feature Selection

Description

Applies ensembles of models to high-dimensional data to both classify and determine important features for classification. The function bootstraps a user-specified number of times to facilitate stability metrics of features selected thereby providing an important metric for biomarker investigations, namely whether the important variables can be identified if the models are refit on 'different' data.

Usage

fs.ensembl.stability(X, Y, method, k = 10, p = 0.9, f = ceiling(ncol(X)/10), bags = 40, aggregation.metric = "CLA", stability.metric = "jaccard", optimize = TRUE, optimize.resample = FALSE, tuning.grid = NULL, k.folds = if (optimize) 10 else NULL, repeats = if (k.folds == "LOO") NULL else if (optimize) 3 else NULL, resolution = if (optimize) 3 else NULL, metric = "Accuracy", model.features = FALSE, allowParallel = FALSE, verbose = "none", ...)

Arguments

X
A matrix containing numeric values of each feature
Y
A factor vector containing group membership of samples
method
A vector listing models to be fit. Available options are "plsda" (Partial Least Squares Discriminant Analysis), "rf" (Random Forest), "gbm" (Gradient Boosting Machine), "svm" (Support Vector Machines), "glmnet" (Elastic-net Generalized Linear Model), and "pam" (Prediction Analysis of Microarrays)
k
Number of bootstrapped interations
p
Percent of data to by 'trained'
f
Number of features desired. Default is top 10 "f = ceiling(ncol(variables)/10)". If rank correlation is desired, set "f = NULL"
bags
Number of iterations for ensemble bagging. Default "bags = 40"
aggregation.metric
String indicating which aggregation metric for features selected during bagging. Avialable options are "CLA" (Complete Linear), "EM" (Ensemble Mean), "ES" (Ensemble Stability), and "EE" (Ensemble Exponential)
stability.metric
string indicating the type of stability metric. Avialable options are "jaccard" (Jaccard Index/Tanimoto Distance), "sorensen" (Dice-Sorensen's Index), "ochiai" (Ochiai's Index), "pof" (Percent of Overlapping Features), "kuncheva" (Kuncheva's Stability Measures), "spearman" (Spearman Rank Correlation), and "canberra" (Canberra Distance)
optimize
Logical argument determining if each model should be optimized. Default "optimize = TRUE"
optimize.resample
Logical argument determining if each resample should be re-optimized. Default "optimize.resample = FALSE" - Only one optimization run, subsequent models use initially determined parameters
tuning.grid
Optional list of grids containing parameters to optimize for each algorithm. Default "tuning.grid = NULL" lets function create grid determined by "res"
k.folds
Number of folds generated during cross-validation. May optionally be set to "LOO" for leave-one-out cross-validation. Default "k.folds = 10"
repeats
Number of times cross-validation repeated. Default "repeats = 3"
resolution
Optional - Resolution of model optimization grid. Default "res = 3"
metric
Criteria for model optimization. Available options are "Accuracy" (Predication Accuracy), "Kappa" (Kappa Statistic), and "AUC-ROC" (Area Under the Curve - Receiver Operator Curve)
model.features
Logical argument if should have number of features selected to be determined by the individual model runs. Default "model.features = FALSE"
allowParallel
Logical argument dictating if parallel processing is allowed via foreach package. Default allowParallel = FALSE
verbose
Character argument specifying how much output progress to print. Options are 'none', 'minimal' or 'full'.
...
Extra arguments that the user would like to apply to the models

Value

Methods
Vector of models fit to data
performance
Performance metrics of each model and bootstrap iteration
RPT
Robustness-Performance Trade-Off as defined in Saeys 2008
features
List concerning features determined via each algorithms feature selection criteria.
  • metric: Stability metric applied
  • features: Matrix of selected features
  • stability: Matrix of pairwise comparions and average stability
stability.models
Function perturbation metric - i.e. how similar are the features selected by each model.
all.tunes
If "optimize.resample = TRUE" then returns list of optimized parameters for each bagging and bootstrap interation.
final.best.tunes
If "optimize.resample = TRUE" then returns list of optimized parameters for each bootstrap of the bagged models refit to aggregated selected features.
specs
List with the following elements:
  • total.samples: Number of samples in original dataset
  • number.features: Number of features in orginal dataset
  • number.groups: Number of groups
  • group.levels: The specific levels of the groups
  • number.observations.group: Number of observations in each group

References

Saeys Y., Abeel T., et. al. (2008) Machine Learning and Knowledge Discovery in Databases. 313-325. http://link.springer.com/chapter/10.1007/978-3-540-87481-2_21

Examples

Run this code
## Not run: 
# fits <- fs.ensembl.stability(vars,
# groups,
# method = c("plsda", "rf"),
# f = 10,
# k = 3,
# k.folds = 10,
# verbose = 'none')
# ## End(Not run)

Run the code above in your browser using DataLab