Learn R Programming

"OmicsMarkeR"

OmicsMarkeR is an R package that provides functions for classification and feature selection of 'omics' level datasets.

Motivation

During my studies as a developing Systems Biologist I discovered there were often varied techniques to answer the same initial question, how can I classify high-dimensional data (i.e. metabolomics, proteomics, transcriptomics)?
A second question usually posed in Biomarker investigations was which features are most important to such classification.

I initially pursued the repositories of CRAN and Bioconductor. I discovered such wonderful packages such as caret (which I highly recommend); however, I was unable to find a means of systematically running multiple algorithms in addition to stability metrics to provide confidence with features identified as important. This is critical as there seemed little practical benefit to classifying 2+ groups if the features identified varied between each test.

In my readings, I came upon an excellent chapter in the Lecture Notes of Computer Science Vol. 5212 entitled 'Robust Feature Selection Using Ensemble Feature Selection Techniques' by Yvan Sayes, Thomas Abeel, and Yves Van de Peer. From this chapter I decided to build this package, a tool to provide multiple multivariate classification and feature selection techniques complete with multiple stability metrics and aggregation techniques. In this manner, this package provides a way to systematically compare both data perturbation and function perturbation ensemble techniques complete with a harmonic mean of feature robustness and classification performance to evaluate the optimal model for the individual dataset. This following David Wolpert's 'No Free Lunch Theorem' as there is no single model that is appropriate for all problems.

I have made every effort to cite articles in which either the original technique was developed or applied. The interested reader, as well you should be, is highly encouraged to seek out these articles.

Installation

Stable version Bioconductor

source("http://bioconductor.org/biocLite.R")
biocLite("OmicsMarkeR")

Features in Progress

  1. Access to fitted models (averaged or all bootstrapped results?)
  2. Easy graphics access (scores/loadings plots, variable importance plots, etc.)
  3. Summary graphics (across models)
  4. Database searching (HMDB, MMCD, Metlin, LipidMaps, etc.)
  5. Additional algorithms
  6. Additional ensemble methods (bayesian, boosting, etc.)

Copy Link

Version

Version

1.4.2

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Charles Jr

Last Published

February 15th, 2017

Functions in OmicsMarkeR (1.4.2)

feature.table

Feature Consistency Table
perf.calc

Performance Statistics Calculations
performance.stats

Performance Statistics (Internal for perf.calc)
svmrfeFeatureRanking

SVM Recursive Feature Extraction (Binary)
EM

Ensemble Mean Aggregation
create.corr.matrix

Correlated Multivariate Data Generator
sorensen

Dice-Sorensen's Index
svm.weights

SVM Multiclass Weights Ranking
EE

Ensemble Exponential Aggregation
extract.features

Feature Extraction
training

Model Training
kuncheva

Kuncheva's Index
canberra_stability

Canberra Stability
denovo.grid

Denovo Grid Generation
modelTuner

Model Tuner
RPT

Robustness-Performance Trade-Off
create.random.matrix

Random Multivariate Data Generator
noise.matrix

Noise Matrix Generator
CLA

Complete Linear Aggregation
pairwise.model.stability

Pairwise Model Stability Metrics
ochiai

Ochiai's Index
perm.class

Monte Carlo Permutation of Model Performance
svmrfeFeatureRankingForMulticlass

SVM Recursive Feature Extraction (Multiclass)
pof

Percentage of Overlapping Features
spearman

Spearman Rank Correlation Coefficient
optimize.model

Model Optimization and Metrics
pairwise.stability

Pairwise Stability Metrics
fs.ensembl.stability

Ensemble Classification & Feature Selection
perm.features

Feature Selection via Monte Carlo Permutation
fit.only.model

Fit Models without Feature Selection
canberra

Canberra Distance
create.discr.matrix

Discriminatory Multivariate Data Generator
fs.stability

Classification & Feature Selection
jaccard

Jaccard Index
bagging.wrapper

Bagging Wrapper for Ensemble Features Selection
modelTuner_loo

Model Tuner for Leave-One-Out Cross-Validation
predictNewClasses

Class Prediction
params

Model Parameters and Properties
prediction.metrics

Prediction Metric Calculations
predicting

Model Group Prediction
ES

Ensemble Stability Aggregation
extract.args

Argument extractor
sequester

Sequester Additional Parameters
performance.metrics

Performance Metrics of fs.stability or fs.ensembl.stability object
tune.instructions

Model Optimization Instructions
aggregation

Feature Aggregation
modelList

Model List