exprso (version 0.1.8)

fs: Perform Feature Selection

Description

A collection of functions to select features.

Usage

fsSample(object, top = 0, ...)

fsNULL(object, top = 0, ...)

fsANOVA(object, top = 0, ...)

fsInclude(object, top = 0, include)

fsStats(object, top = 0, ...)

fsPrcomp(object, top = 0, ...)

fsPathClassRFE(object, top = 0, ...)

fsEbayes(object, top = 0, ...)

fsMrmre(object, top = 0, ...)

# S4 method for ExprsArray fsSample(object, top = 0, ...)

# S4 method for ExprsArray fsNULL(object, top = 0, ...)

# S4 method for ExprsBinary fsInclude(object, top = 0, include)

# S4 method for ExprsArray fsANOVA(object, top = 0, ...)

# S4 method for ExprsBinary fsStats(object, top = 0, how = c("t.test", "ks.test"), ...)

# S4 method for ExprsBinary fsPrcomp(object, top = 0, ...)

# S4 method for ExprsBinary fsPathClassRFE(object, top = 0, ...)

# S4 method for ExprsBinary fsEbayes(object, top = 0, ...)

# S4 method for ExprsBinary fsMrmre(object, top = 0, ...)

Arguments

object

Specifies the ExprsArray object to undergo feature selection.

top

A numeric scalar or character vector. A numeric scalar indicates the number of top features that should undergo feature selection. A character vector indicates specifically which features by name should undergo feature selection. Set top = 0 to include all features. A numeric vector can also be used to indicate specific features by location, similar to a character vector.

...

Arguments passed to the respective wrapped function.

include

A character vector. The names of features to rank above all others. This preserves the feature order otherwise. Argument for fsInclude only.

how

A character string. Toggles between the sub-routines "t.test" and "ks.test". Argument for fsStats only.

Value

Returns an ExprsArray object.

Methods (by generic)

fsSample: Method to perform random feature selection using base::sample.

fsNULL: Method to perform a NULL feature selection and return input unaltered.

fsInclude: Method to rank explicitly stated features above all others.

fsANOVA: Method to perform ANOVA feature selection using stats::aov.

fsStats: Method to perform statistics based feature selection using stats::t.test and others.

fsPrcomp: Method to perform principal components analysis using stats::prcomp.

fsPathClassRFE: Method to perform SVM-RFE feature selection using pathClass::fit.rfe.

fsEbayes: Method to perform empiric Bayes feature selection using limma::ebayes.

fsMrme: Method to perform mRMR feature selection using mRMRe::mRMR.classic.

Details

Considering the high-dimensionality of most genomic datasets, it is prudent and often necessary to prioritize which features to include during classifier construction. Although there exists many feature selection methods, this package provides wrappers for some of the most popular ones. Each wrapper (1) pre-processes the ExprsArray input, (2) performs the feature selection, and (3) returns an ExprsArray output with an updated feature selection history. You can use, in tandem, any number of feature selection methods, and in any order.

For all feature selection methods, @preFilter and @reductionModel stores the feature selection and dimension reduction history, respectively. This history gets passed along to prepare the test or validation set during model deployment, ensuring that these sets undergo the same feature selection and dimension reduction as the training set.

Under the scenarios where users plan to apply multiple feature selection or dimension reduction steps, the top argument manages which features (e.g., gene expression values) to send through each feature selection or dimension reduction procedure. For top, a numeric scalar indicates the number of top features to use, while a character vector indicates specifically which features to use. In this way, the user sets which features to feed INTO the fs method (NOT which features the user expects OUT). The example below shows how to apply dimension reduction to the top 50 features as selected by the Student's t-test. Set top = 0 to pass all features through an fs method.

Note that not all feature selection methods will generalize to multi-class data. A feature selection method will fail when applied to an ExprsMulti object unless that feature selection method has an ExprsMulti method.

Note that fsMrmre crashes when supplied a very large feature_count argument owing to its implementation in the imported package mRMRe.

See Also

fs build doMulti exprso-predict plCV plGrid plGridMulti plMonteCarlo plNested

Examples

Run this code
# NOT RUN {
library(golubEsets)
data(Golub_Merge)
array <- arrayEset(Golub_Merge, colBy = "ALL.AML", include = list("ALL", "AML"))
array <- modFilter(array, 20, 16000, 500, 5) # pre-filter Golub ala Deb 2003
array <- modTransform(array) # lg transform
array <- modNormalize(array, c(1, 2)) # normalize gene and subject vectors
arrays <- splitSample(array, percent.include = 67)
array.train <- fsStats(arrays[[1]], top = 0, how = "t.test")
array.train <- fsPrcomp(array.train, top = 50)
mach <- buildSVM(array.train, top = 5, kernel = "linear", cost = 1)
# }

Run the code above in your browser using DataCamp Workspace