The exprso
package includes these feature selection modules:
- fsSample
- fsNULL
- fsANOVA
- fsStats
- fsPrcomp
- fsEbayes
- fsEdger
- fsMrmre
- fsPropd
Considering the high-dimensionality of most genomic datasets, it is prudent and often necessary
to prioritize which features to include during classifier construction. Although there exists
many feature selection methods, this package provides wrappers for some of the most popular ones.
Each wrapper (1) pre-processes the ExprsArray
input, (2) performs the feature selection,
and (3) returns an ExprsArray
output with an updated feature selection history.
You can use, in tandem, any number of feature selection methods, and in any order.
For all feature selection methods, @preFilter
and @reductionModel
stores the
feature selection and dimension reduction history, respectively. This history gets passed
along to prepare the test or validation set during model deployment, ensuring that these
sets undergo the same feature selection and dimension reduction as the training set.
Under the scenarios where users plan to apply multiple feature selection or dimension
reduction steps, the top
argument manages which features (e.g., gene expression values)
to send through each feature selection or dimension reduction procedure. For top
,
a numeric scalar indicates the number of top features to use, while a character vector
indicates specifically which features to use. In this way, the user sets which features
to feed INTO the fs
method (NOT which features the user expects OUT). The example
below shows how to apply dimension reduction to the top 50 features as selected by the
Student's t-test. Set top = 0
to pass all features through an fs
method.
Note that not all feature selection methods will generalize to multi-class data.
A feature selection method will fail when applied to an ExprsMulti
object
unless that feature selection method has an ExprsMulti
method.