nmfEstimateRank: Estimate optimal rank for Nonnegative Matrix Factorization (NMF) models

Description

A critical parameter in NMF algorithms is the factorization rank $r$. It defines the number of basis effects used to approximate the target matrix. Function nmfEstimateRank helps in choosing an optimal rank by implementing simple approaches proposed in the litterature.

Usage

nmfEstimateRank(x, range, method = nmf.getOption("default.algorithm"), nrun = 30, verbose=FALSE, stop=FALSE, ...)
## S3 method for class 'NMF.rank':
plot(x, what = c('all', 'cophenetic', 'rss', 'residuals'
						, 'dispersion', 'evar', 'sparseness'
						, 'sparseness.basis', 'sparseness.coef')
						, ref=NULL, na.rm=FALSE, ...)

Arguments

method

A single NMF algorithm, in one of the format accepted by interface nmf.

na.rm

single logical that specifies if the rank for which the measures are NA values should be removed from the graph or not (default to FALSE). This is useful when plotting results which include NAs due to error during the estimation pr

nrun

a numeric giving the number of run to perform for each value in range.

range

a numeric vector containing the ranks of factorization to try.

ref

reference object of class NMF.rank, as returned by function nmfEstimateRank. The measures contained in ref are used and plotted as a reference. The associated curves are drawn in red, while those from

verbose

toggle verbosity. This parameter only affects the verbosity of the outer loop over the values in rank. To print verbose (resp. debug) messages from each NMF run, one can use .options='v' (resp. .options='d') tha

stop

logical flag for running the estimation process with fault tolerance. When TRUE, the whole execution will stop if any error is raised. When FALSE (default), the runs that raise an error will be skipped, and the execution wil

what

a character string that partially matches one of the following item: 'all', 'cophenetic', 'rss', 'residuals' , 'dispersion'. It specifies which measure must be plotted (

For nmfEstimateRank a target object to be estimated, in one of the format accepted by interface nmf. For plot.NMF.rank an object of class NMF.rank as returned by

...

For nmfEstimateRank, these are extra parameters passed to interface nmf. Note that the same parameters are used for each value of the rank. See nmf. For plot.NMF.ran

Value

A S3 object (i.e. a list) of class NMF.rank with the following slots:
measuresa data.frame containing the quality measures for each rank of factorizations in range. Each row correspond to a measure, each column to a rank.
consensusa list of consensus matrices, indexed by the rank of factorization (as a character string).

Details

Given a NMF algorithm and the target matrix, a common way of estimating $r$ is to try different values, compute some quality measures of the results, and choose the best value according to this quality criteria. See Brunet et al. (2004) and Hutchins et al. (2008).

The function nmfEstimateRank allow to launch this estimation procedure. It performs multiple NMF runs for a range of rank of factorization and, for each, returns a set of quality measures together with the associated consensus matrice.

References

Metagenes and molecular pattern discovery using matrix factorization Brunet, J.~P., Tamayo, P., Golub, T.~R., and Mesirov, J.~P. (2004) Proc Natl Acad Sci U S A 101(12), 4164--4169.

Examples

Run this code

set.seed(123456)
n <- 50; r <- 3; m <- 20
V <- syntheticNMF(n, r, m, noise=TRUE)

# Use a seed that will be set before each first run
res.estimate <- nmfEstimateRank(V, seq(2,5), method='brunet', nrun=10, seed=123456)

# plot all the measures
plot(res.estimate)
# or only one: e.g. the cophenetic correlation coefficient
plot(res.estimate, 'cophenetic')

Run the code above in your browser using DataLab