fs: Select Features

Description

The exprso package includes these feature selection modules:

- fsSample

- fsNULL

- fsANOVA

- fsInclude

- fsStats

- fsPrcomp

- fsEbayes

- fsEdger

- fsMrmre

- fsPropd

Arguments

Details

Considering the high-dimensionality of most genomic datasets, it is prudent and often necessary to prioritize which features to include during classifier construction. Although there exists many feature selection methods, this package provides wrappers for some of the most popular ones. Each wrapper (1) pre-processes the ExprsArray input, (2) performs the feature selection, and (3) returns an ExprsArray output with an updated feature selection history. You can use, in tandem, any number of feature selection methods, and in any order.

For all feature selection methods, @preFilter and @reductionModel stores the feature selection and dimension reduction history, respectively. This history gets passed along to prepare the test or validation set during model deployment, ensuring that these sets undergo the same feature selection and dimension reduction as the training set.

Under the scenarios where users plan to apply multiple feature selection or dimension reduction steps, the top argument manages which features (e.g., gene expression values) to send through each feature selection or dimension reduction procedure. For top, a numeric scalar indicates the number of top features to use, while a character vector indicates specifically which features to use. In this way, the user sets which features to feed INTO the fs method (NOT which features the user expects OUT). The example below shows how to apply dimension reduction to the top 50 features as selected by the Student's t-test. Set top = 0 to pass all features through an fs method.

Note that not all feature selection methods will generalize to multi-class data. A feature selection method will fail when applied to an ExprsMulti object unless that feature selection method has an ExprsMulti method.