featureScore: Feature Selection in NMF Models

Description

The function featureScore implements different methods to computes basis-specificity scores for each feature in the data.

The function extractFeatures implements different methods to select the most basis-specific features of each basis component.

Usage

featureScore(object, ...)
  ## S3 method for class 'matrix':
featureScore(object,
    method = c("kim", "max"))
  extractFeatures(object, ...)
  ## S3 method for class 'matrix':
extractFeatures(object,
    method = c("kim", "max"),
    format = c("list", "combine", "subset"), nodups = TRUE)

Arguments

object

an object from which scores/features are computed/extracted

...

extra arguments to allow extension

method

scoring or selection method. It specifies the name of one of the method described in sections Feature scores and Feature selection.

Additionally for extractFeatures, it may be an integer vector that indicates the nu

format

output format. The following values are accepted: [object Object],[object Object],[object Object]

nodups

logical that indicates if duplicated indexes, i.e. features selected on multiple basis components (which should in theory not happen), should be only appear once in the result. Only used when format='combine'.

Value

featureScore returns a numeric vector of the length the number of rows in object (i.e. one score per feature).
extractFeatures returns the selected features as a list of indexes, a single integer vector or an object of the same class as object that only contains the selected features.

Details

One of the properties of Nonnegative Matrix Factorization is that is tend to produce sparse representation of the observed data, leading to a natural application to bi-clustering, that characterises groups of samples by a small number of features.

In NMF models, samples are grouped according to the basis components that contributes the most to each sample, i.e. the basis components that have the greatest coefficient in each column of the coefficient matrix (see predict,NMF-method). Each group of samples is then characterised by a set of features selected based on basis-specifity scores that are computed on the basis matrix.

References

Kim H and Park H (2007). "Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis." _Bioinformatics (Oxford, England)_, *23*(12), pp. 1495-502. ISSN 1460-2059, , .

Carmona-Saez P, Pascual-Marqui RD, Tirado F, Carazo JM and Pascual-Montano A (2006). "Biclustering of gene expression data by Non-smooth Non-negative Matrix Factorization." _BMC bioinformatics_, *7*, pp. 78. ISSN 1471-2105, , .

Examples

Run this code

# random NMF model
x <- rnmf(3, 50,20)

# probably no feature is selected
extractFeatures(x)
# extract top 5 for each basis
extractFeatures(x, 5L)
# extract features that have a relative basis contribution above a threshold
extractFeatures(x, 0.5)
# ambiguity?
extractFeatures(x, 1) # means relative contribution above 100\%
extractFeatures(x, 1L) # means top contributing feature in each component

Run the code above in your browser using DataLab