nmfEstimateRank
helps in choosing an
optimal rank by implementing simple approaches proposed
in the literature.nmfEstimateRank(x, range,
method = nmf.getOption("default.algorithm"), nrun = 30,
model = NULL, ..., verbose = FALSE, stop = FALSE) ## S3 method for class 'NMF.rank':
plot(x, y = NULL,
what = c("all", "cophenetic", "rss", "residuals", "dispersion", "evar", "sparseness", "sparseness.basis", "sparseness.coef"),
na.rm = FALSE, ...)
nmfEstimateRank
a target object to be
estimated, in one of the format accepted by interface
nmf
. For plot.NMF.rank
an object of class
NMF.rank
as returned by
numeric
vector containing the ranks
of factorization to try. Note that duplicates are removed
and values are sorted in increasing order. The results
are notably returned in this order.nmf
.numeric
giving the number of run to
perform for each value in range
.nmf
call. In particular, when x
is a
formula, it is passed to argument data
of
nmfModel
to determine the target matrix --
arange
. To print verbose (resp. debug) messages
from each NMF run, one can use .options='v'
(resp.
.options='d'
TRUE
, the
whole execution will stop if any error is raised. When
FALSE
(default), the runs that raise an error will
be skipped, and the executionmfEstimateRank
, these are extra
parameters passed to interface nmf
. Note that the
same parameters are used for each value of the rank. See
nmf
. For plot.NMF.rank
FALSE
). This is
useful when plotting results which include NAs due to
error during the estimation procNMF.rank
, as
returned by function nmfEstimateRank
. The measures
contained in y
are used and plotted as a
reference. It is typically used to plot results obtained
from randomized datacharacter
string that partially
matches one of the following item: nmfEstimateRank
returns a S3 object (i.e. a list)
of class NMF.rank
with the following elements:data.frame
containing the
quality measures for each rank of factorizations in
range
. Each row corresponds to a measure, each
column to a rank.list
of
consensus matrices, indexed by the rank of factorization
(as a character string).list
of
the fits, indexed by the rank of factorization (as a
character string).nmf
with a range of
ranks.Given a NMF algorithm and the target matrix, a common way of estimating $r$ is to try different values, compute some quality measures of the results, and choose the best value according to this quality criteria. See Brunet et al. (2004) and Hutchins et al. (2008).
The function nmfEstimateRank
allows to perform
this estimation procedure. It performs multiple NMF runs
for a range of rank of factorization and, for each,
returns a set of quality measures together with the
associated consensus matrix.
In order to avoid overfitting, it is recommended to run
the same procedure on randomized data. The results on the
original and the randomised data may be plotted on the
same plots, using argument y
.
Hutchins LN, Murphy SM, Singh P and Graber JH (2008).
"Position-dependent motif characterization using
non-negative matrix factorization." _Bioinformatics
(Oxford, England)_, *24*(23), pp. 2684-90. ISSN
1367-4811,
set.seed(123456)
n <- 50; r <- 3; m <- 20
V <- syntheticNMF(n, r, m)
# Use a seed that will be set before each first run
res <- nmfEstimateRank(V, seq(2,5), method='brunet', nrun=10, seed=123456)
# or equivalently
res <- nmf(V, seq(2,5), method='brunet', nrun=10, seed=123456)
# plot all the measures
plot(res)
# or only one: e.g. the cophenetic correlation coefficient
plot(res, 'cophenetic')
# run same estimation on randomized data
rV <- randomize(V)
rand <- nmfEstimateRank(rV, seq(2,5), method='brunet', nrun=10, seed=123456)
plot(res, rand)
Run the code above in your browser using DataLab