nmf
Running NMF algorithms
The function nmf
is a S4 generic defines the main
interface to run NMF algorithms within the framework
defined in package NMF
. It has many methods that
facilitates applying, developing and testing NMF
algorithms.
The package vignette vignette('NMF')
contains an
introduction to the interface, through a sample data
analysis.
 Keywords
 methods
Usage
nmf(x, rank, method, ...)
"nmf"(x, rank, method, seed = NULL, model = NULL, ...)
"nmf"(x, rank, method, ..., .parameters = list())
"nmf"(x, rank, method, seed, model = "NMFstd", ..., name, objective = "euclidean", mixed = FALSE)
"nmf"(x, rank, method, seed, ...)
"nmf"(x, rank, method, seed, ...)
"nmf"(x, rank, method, seed, model = list(), ...)
"nmf"(x, rank, method, ..., model = NULL)
"nmf"(x, rank, method, seed = nmf.getOption("default.seed"), rng = NULL, nrun = if (length(rank) > 1) 30 else 1, model = NULL, .options = list(), .pbackend = nmf.getOption("pbackend"), .callback = NULL, ...)
Arguments
 x
 target data to fit, i.e. a matrixlike object
 rank
 specification of the factorization rank. It
is usually a single numeric value, but other type of
values are possible (e.g. matrix), for which specific
methods are implemented. See for example methods
nmf,matrix,matrix,ANY
.If
rank
is a numeric vector with more than one element, e.g. a range of ranks, thennmf
performs the estimation procedure described innmfEstimateRank
.  method
 specification of the NMF algorithm. The
most common way of specifying the algorithm is to pass
the access key (i.e. a character string) of an algorithm
stored in the package's dedicated registry, but methods
exists that handle other types of values, such as
function
orlist
object. See their descriptions in section Methods.If
method
is missing the algorithm to use is obtained from the optionnmf.getOption('default.algorithm')
, unless it can be infer from the type of NMF model to fit, if this later is available from other arguments. Factory fresh default value is ‘brunet’, which corresponds to the standard NMF algorithm from Brunet2004 (see section Algorithms).Cases where the algorithm is inferred from the call are when an NMF model is passed in arguments
rank
orseed
(see description fornmf,matrix,numeric,NULL
in section Methods).  ...
 extra arguments to allow extension of the
generic. Arguments that are not used in the chain of
internal calls to
nmf
methods are passed to the function that effectively implements the algorithm that fits an NMF model onx
.  .parameters
 list of methodspecific parameters.
Its elements must have names matching a single method
listed in
method
, and be lists of named values that are passed to the corresponding method.  name
 name associated with the NMF algorithm
implemented by the function
method
[only used whenmethod
is a function].  objective
 specification of the objective function
associated with the algorithm implemented by the function
method
[only used whenmethod
is a function].It may be either
'euclidean'
or'KL'
for specifying the euclidean distance (Frobenius norm) or the KullbackLeibler divergence respectively, or a function with signature(x="NMF", y="matrix", ...)
that computes the objective value for an NMF modelx
on a target matrixy
, i.e. the residuals between the target matrix and its NMF estimate. Any extra argument may be specified, e.g.function(x, y, alpha, beta=2, ...)
.  mixed
 a logical that indicates if the algorithm
implemented by the function
method
support mixedsign target matrices, i.e. that may contain negative values [only used whenmethod
is a function].  seed
 specification of the starting point or
seeding method, which will compute a starting point,
usually using data from the target matrix in order to
provide a good guess.
The seeding method may be specified in the following way:
 a
character
string:  giving the name of a registered seeding method. The corresponding method will be called to compute the starting point.
 a
list
:  giving the name of a registered seeding method and, optionally, extra parameters to pass to it.
 a single
numeric
:  that is used to seed the random number generator, before generating a random starting point.
 an object that inherits from
NMF
:  it should contain the data of an initialised NMF model, i.e. it must contain valid basis and mixture coefficient matrices, directly usable by the algorithm's workhorse function.
 a
function
:  that computes the starting
point. It must have signature
(object="NMF", target="matrix", ...)
and return an object that inherits from classNMF
. It is recommended to use argumentobject
as a template for the returned object, by only updating the basis and coefficient matrices, usingbasis<
andcoef<
respectively.  rng
 rng specification for the run(s). This argument should be used to set the the RNG seed, while still specifying the seeding method argument seed.
 model
 specification of the type of NMF model to
use.
It is used to instantiate the object that inherits from class
NMF
, that will be passed to the seeding method. The following values are supported:
NULL
, the default model associated to the NMF algorithm is instantiated and...
is lookedup for arguments with names that correspond to slots in the model class, which are passed to the functionnmfModel
to instantiate the model. Arguments in...
that do not correspond to slots are passed to the algorithm.
 a single
character
string, that is the name of the NMF model class to be instantiate. In this case, arguments in...
are handled in the same way as whenmodel
isNULL
. a
list
that contains named values that are passed to the functionnmfModel
to instantiate the model. In this case,...
is not lookedup at all, and passed entirely to the algorithm. This means that all necessary model parameters must be specified inmodel
.Argument/slot conflicts: In the case a parameter of the algorithm has the same name as a model slot, then
model
MUST be a list  possibly empty , if one wants this parameter to be effectively passed to the algorithm.If a variable appears in both arguments
model
and...
, the former will be used to initialise the NMF model, the latter will be passed to the NMF algorithm. See code examples for an illustration of this situation. 
 nrun
 number of runs to perform. It specifies the
number of runs to perform. By default only one run is
performed, except if
rank
is a numeric vector with more than one element, in which case a default of 30 runs per value of the rank are performed, allowing the computation of a consensus matrix that is used in selecting the appropriate rank (seeconsensus
).When using a random seeding method, multiple runs are generally required to achieve stability and avoid bad local minima.
 .options
 this argument is used to set runtime
options.
It can be a
list
containing named options with their values, or, in the case only boolean/integer options need to be set, a character string that specifies which options are turned on/off or their value, in a unixlike command line argument way.The string must be composed of characters that correspond to a given option (see mapping below), and modifiers '+' and '' that toggle options on and off respectively. E.g.
.options='tv'
will toggle on optionstrack
andverbose
, while.options='tv'
will toggle on optiontrack
and toggle off optionverbose
.Modifiers '+' and '' apply to all option character found after them:
tvp+k
meanstrack=TRUE
,verbose=parallel=FALSE
, andkeep.all=TRUE
. The default behaviour is to assume that.options
starts with a '+'.for options that accept integer values, the value may be appended to the option's character e.g.
'p4'
for asking for 4 processors or'v3'
for showing verbosity message up to level 3.The following options are available (the characters after “” are those to use to encode
.options
as a string):  debug  d
 Toggle debug mode (default:
FALSE
). Like optionverbose
but with more information displayed.  keep.all  k
 used when performing multiple runs
(
nrun
>1): ifTRUE
, all factorizations are saved and returned (default:FALSE
). Otherwise only the factorization achieving the minimum residuals is returned.  parallel  p
 this option is useful on multicore
*nix or Mac machine only, when performing multiple runs
(
nrun
> 1) (default:TRUE
). IfTRUE
, the runs are performed using the parallel foreach backend defined in argument.pbackend
. If this is set to'mc'
or'par'
thennmf
tries to perform the runs using multiple cores with packagelink[doParallel]{doParallel}
 which therefore needs to be installed.  parallel.required  P
 Same as
p
, but an error is thrown if the computation cannot be performed in parallel or with the specified number of processors.  shared.memory  m
 toggle usage of shared memory
(requires the synchronicity package). Default is as
defined by
nmf.getOption('shared.memory')
.  restore.seed  r
 deprecated option since version 0.5.99. Will throw a warning if used.
 simplifyCB  S
 toggle simplification of the
callback results. Default is
TRUE
 track  t
 enables error tracking (default:
FALSE). If
TRUE
, the returned object's slotresiduals
contains the trajectory of the objective values, which can be retrieved viaresiduals(res, track=TRUE)
This tracking functionality is available for all builtin algorithms.  verbose  v
 Toggle verbosity (default:
FALSE
). IfTRUE
, messages about the configuration and the state of the current run(s) are displayed. The level of verbosity may be specified with an integer value, the greater the level the more messages are displayed. ValueFALSE
means no messages are displayed, while valueTRUE
is equivalent to verbosity level 1.  .pbackend
 specification of the
foreach
parallel backend to register and/or use when running in parallel mode. See optionsp
andP
in argument.options
for how to enable this mode. Note that any backend that is internally registered is cleanedup on exit, so that the calling foreach environment should not be affected by a call tonmf
 except when.pbackend=NULL
.Currently it accepts the following values:
 ‘par’
 use the backend(s) defined by the
package
doParallel
;  a numeric value
 use the specified number of cores with
doParallel
backend;  ‘seq’
 use the
foreach sequential backend
doSEQ
; NULL
 use currently registered backend;
NA
 do not compute using a foreach loop 
and therefore not in parallel  but rather use a call to
standard
sapply
. This is useful for when developing/debugging NMF algorithms, as foreach loop handling may sometime get in the way.  ‘mc’
 identical to ‘par’ and defined to ensure backward compatibility.
 .callback
 Used when option
keep.all=FALSE
(default). It allows to pass a callback function that is called after each run when performing multiple runs (i.e. withnrun>1
). This is useful for example if one is also interested in saving summary measures or process the result of each NMF fit before it gets discarded. After each run, the callback function is called with two arguments, theNMFfit
object that as just been fitted and the run number:.callback(res, i)
. For convenience, a function that takes only one argument or has signature(x, ...)
can still be passed in.callback
. It is wrapped internally into a dummy function with two arguments, only the first of which is passed to the actual callback function (see example withsummary
).The call is wrapped into a tryCatch so that callback errors do not stop the whole computation (see below).
The results of the different calls to the callback function are stored in a miscellaneous slot accessible using the method
$
forNMFfit
objects:res$.callback
. By defaultnmf
tries to simplify the list of callback result usingsapply
, unless option'simplifyCB'
isFASE
.If no error occurs
res$.callback
contains the list of values that resulted from the calling the callback function , ordered as the fits. If any error occurs in one of the callback calls, then the whole computation is not stopped, but the error message is stored inres$.callback
, in place of the result.See the examples for sample code.
Available methods can be listed via nmfSeed()
. See
its dedicated documentation for details on each available
registered methods (nmfSeed
).
Note that when performing multiple runs, the L'Ecuyer's RNG is used in order to produce a sequence of random streams, that is used in way that ensures that parallel computation are fully reproducible.
If equal to an integer, then nmf
tries to perform
the computation on the specified number of processors.
When passing options as a string the number is appended
to the option's character e.g. 'p4'
for asking for
4 processors.
If FALSE
, then the computation is performed
sequentially using the base function
sapply
.
Unlike option 'P' (capital 'P'), if the computation cannot be performed in parallel, then it will still be carried on sequentially.
IMPORTANT NOTE FOR MAC OS X USERS: The parallel
computation is based on the doMC
and
multicore
packages, so the same care should be
taken as stated in the vignette of doMC
:
“it is not safe to use doMC from R.app on
Mac OS X. Instead, you should use doMC from a terminal
session, starting R from the command line.”
Note that this is equivalent to using
.options='p'
or .options='p0'
, but takes
precedence over any option specified in .options
:
e.g. nmf(..., .options='P10', .pbackend=NA)
performs all runs sequentially using sapply
. Use
nmf.options(pbackend=NA)
to completely disable
foreach/parallel computations for all subsequent
nmf
calls.
Details
The nmf
function has multiple methods that compose
a very flexible interface allowing to:

combine NMF algorithms with seeding methods and/or
stopping/convergence criterion at runtime;
 perform multiple NMF runs, which are computed in
parallel whenever the host machine allows it;
 run multiple algorithms with a common set of parameters, ensuring a consistent environment (notably the RNG settings).
The workhorse method is
nmf,matrix,numeric,NMFStrategy
, which is
eventually called by all other methods. The other methods
provides convenient ways of specifying the NMF
algorithm(s), the factorization rank, or the seed to be
used. Some allow to directly run NMF algorithms on
different types of objects, such as data.frame
or
ExpressionSet
objects.
Value

The returned value depends on the run mode:
 Single run:
 An object of class
NMFfit
.  Multiple runs, single method:
 When
nrun > 1
andmethod
is notlist
, this method returns an object of classNMFfitX
.  Multiple runs, multiple methods:
 When
nrun > 1
andmethod
is alist
, this method returns an object of classNMFList
.
Methods
 nmf
signature(x = "data.frame", rank = "ANY", method = "ANY")
: Fits an NMF model on adata.frame
. The targetdata.frame
is coerced into a matrix withas.matrix
. nmf
signature(x = "matrix", rank = "numeric", method = "NULL")
: Fits an NMF model using an appropriate algorithm whenmethod
is not supplied. This method tries to select an appropriate algorithm amongst the NMF algorithms stored in the internal algorithm registry, which contains the type of NMF models each algorithm can fit. This is possible when the type of NMF model to fit is available from argumentseed
, i.e. if it is an NMF model itself. Otherwise the algorithm to use is obtained fromnmf.getOption('default.algorithm')
. This method is provided for internal usage, when called from othernmf
methods with argumentmethod
missing in the top call (e.g.nmf,matrix,numeric,missing
). nmf
signature(x = "matrix", rank = "numeric", method = "list")
: Fits multiple NMF models on a common matrix using a list of algorithms. The models are fitted sequentially withnmf
using the same options and parameters for all algorithms. In particular, irrespective of the way the computation is seeded, this method ensures that all fits are performed using the same initial RNG settings. This method returns an object of classNMFList
, that is essentially a list containing each fit. nmf
signature(x = "matrix", rank = "numeric", method = "character")
: Fits an NMF model onx
using an algorithm registered with access keymethod
. Argumentmethod
is partially match against the access keys of all registered algorithms (case insensitive). Available algorithms are listed in section Algorithms below or the introduction vignette. A vector of their names may be retrieved vianmfAlgorithm()
. nmf
signature(x = "matrix", rank = "numeric", method = "function")
: Fits an NMF model onx
using a custom algorithm defined the functionmethod
. The supplied function must have signature(x=matrix, start=NMF, ...)
and return an object that inherits from classNMF
. It will be called internally by the workhorsenmf
method, with an NMF model to be used as a starting point passed in its argumentstart
. Extra arguments in...
are passed tomethod
from the topnmf
call. Extra arguments that have no default value in the definition of the functionmethod
are required to run the algorithm (e.g. see argumentalpha
ofmyfun
in the examples). If the algorithm requires a specific type of NMF model, this can be specified in argumentmodel
that is handled as in the workhorsenmf
method (see description for this argument). nmf
signature(x = "matrix", rank = "NMF", method = "ANY")
: Fits an NMF model using the NMF modelrank
to seed the computation, i.e. as a starting point. This method is provided for convenience as a shortcut fornmf(x, nbasis(object), method, seed=object, ...)
It discards any value passed in argumentseed
and uses the NMF model passed inrank
instead. It throws a warning if argumentseed
not missing. Ifmethod
is missing, this method will call the methodnmf,matrix,numeric,NULL
, which will infer an algorithm suitable for fitting an NMF model of the class ofrank
. nmf
signature(x = "matrix", rank = "NULL", method = "ANY")
: Fits an NMF model using the NMF model supplied inseed
, to seed the computation, i.e. as a starting point. This method is provided for completeness and is equivalent tonmf(x, seed, method, ...)
. nmf
signature(x = "matrix", rank = "missing", method = "ANY")
: Method defined to ensure the correct dispatch to workhorse methods in case of argumentrank
is missing. nmf
signature(x = "matrix", rank = "numeric", method = "missing")
: Method defined to ensure the correct dispatch to workhorse methods in case of argumentmethod
is missing. nmf
signature(x = "matrix", rank = "matrix", method = "ANY")
: Fits an NMF model partially seeding the computation with a given matrix passed inrank
. The matrixrank
is used either as initial value for the basis or mixture coefficient matrix, depending on its dimension. Currently, such partial NMF model is directly used as a seed, meaning that the remaining part is left uninitialised, which is not accepted by all NMF algorithm. This should change in the future, where the missing part of the model will be drawn from some random distribution. Amongst builtin algorithms, only ‘snmf/l’ and ‘snmf/r’ support partial seeds, with only the coefficient or basis matrix initialised respectively. nmf
signature(x = "matrix", rank = "data.frame", method = "ANY")
: Shortcut fornmf(x, as.matrix(rank), method, ...)
. nmf
signature(x = "formula", rank = "ANY", method = "ANY")
: This method implements the interface for fitting formulabased NMF models. SeenmfModel
. Argumentrank
target matrix or formula environment. If not missing,model
must be alist
, adata.frame
or anenvironment
in which formula variables are searched for.
Optimized C++ vs. plain R
Lee and Seung's multiplicative updates are used by several NMF algorithms. To improve speed and memory usage, a C++ implementation of the specific matrix products is used whenever possible. It directly computes the updates for each entry in the updated matrix, instead of using multiple standard matrix multiplication. The algorithms that benefit from this optimization are: 'brunet', 'lee', 'nsNMF' and 'offset'. However there still exists plain R versions for these methods, which implement the updates as standard matrix products. These are accessible by adding the prefix '.R#' to their name: '.R#brunet', '.R#lee', '.R#nsNMF' and '.R#offset'.
Algorithms
All algorithms are accessible by their respective access key as listed below. The following algorithms are available:
 ‘brunet’
 Standard NMF, based on the
KullbackLeibler divergence, from Brunet et al. (2004). It
uses simple multiplicative updates from Lee et al. (2001),
enhanced to avoid numerical underflow. Default stopping criterion: invariance of the
connectivity matrix (see
nmf.stop.connectivity
).  ‘lee’
 Standard NMF based on the Euclidean
distance from Lee et al. (2001). It uses simple
multiplicative updates. Default stopping criterion: invariance of the
connectivity matrix (see
nmf.stop.connectivity
).  lsnmf
 LeastSquare NMF from Wang et al. (2006). It
uses modified versions of Lee and Seung's multiplicative
updates for the Euclidean distance, which incorporates
weights on each entry of the target matrix, e.g. to
reflect measurement uncertainty. Default stopping criterion: stationarity of the objective
function (see
nmf.stop.stationary
).  ‘nsNMF’
 Nonsmooth NMF from
PascualMontano et al. (2006). It uses a modified version of
Lee and Seung's multiplicative updates for the
KullbackLeibler divergence Lee et al. (2001), to fit a
extension of the standard NMF model, that includes an
intermediate smoothing matrix, meant meant to produce
sparser factors. Default stopping criterion: invariance of the
connectivity matrix (see
nmf.stop.connectivity
).  ‘offset’
 NMF with offset from
Badea (2008). It uses a modified version of Lee and
Seung's multiplicative updates for Euclidean distance
Lee et al. (2001), to fit an NMF model that includes an
intercept, meant to capture a common baseline and shared
patterns, in order to produce cleaner basis components. Default stopping criterion: invariance of the
connectivity matrix (see
nmf.stop.connectivity
).  ‘penmf’
 PatternExpression NMF from
Zhang2008. It uses multiplicative updates to
minimize an objective function based on the Euclidean
distance, that is regularized for effective expression of
patterns with basis vectors. Default stopping criterion: stationarity of the objective
function (see
nmf.stop.stationary
).  ‘snmf/r’, ‘snmf/l’
 Alternating
Least Square (ALS) approach from Kim et al. (2007). It
applies the nonnegative leastsquares algorithm from
Van Benthem et al. (2004) (i.e. fast combinatorial
nonnegative leastsquares for multiple righthand), to
estimate the basis and coefficient matrices alternatively
(see
fcnnls
). It minimises an Euclideanbased objective function, that is regularized to favour sparse basis matrices (for ‘snmf/l’) or sparse coefficient matrices (for ‘snmf/r’). Stopping criterion: builtin within the internal workhorse functionnmf_snmf
, based on the KKT optimality conditions.
Seeding methods
The purpose of seeding methods is to compute initial
values for the factor matrices in a given NMF model. This
initial guess will be used as a starting point by the
chosen NMF algorithm. The seeding method to use in combination with the
algorithm can be passed to interface nmf
through
argument seed
. The seeding seeding methods
available in registry are listed by the function
nmfSeed
(see list therein). Detailed examples of how to specify the seeding method
and its parameters can be found in the Examples
section of this man page and in the package's vignette.
References
Brunet J, Tamayo P, Golub TR and Mesirov JP (2004).
"Metagenes and molecular pattern discovery using matrix
factorization." _Proceedings of the National Academy of
Sciences of the United States of America_, *101*(12), pp.
41649. ISSN 00278424,
Lee DD and Seung H (2001). "Algorithms for nonnegative
matrix factorization." _Advances in neural information
processing systems_.
Wang G, Kossenkov AV and Ochs MF (2006). "LSNMF: a
modified nonnegative matrix factorization algorithm
utilizing uncertainty estimates." _BMC bioinformatics_,
*7*, pp. 175. ISSN 14712105,
PascualMontano A, Carazo JM, Kochi K, Lehmann D and Pascualmarqui RD (2006). "Nonsmooth nonnegative matrix factorization (nsNMF)." _IEEE Trans. Pattern Anal. Mach. Intell_, *28*, pp. 403415.
Badea L (2008). "Extracting gene expression profiles
common to colon and pancreatic adenocarcinoma using
simultaneous nonnegative matrix factorization." _Pacific
Symposium on Biocomputing. Pacific Symposium on
Biocomputing_, *290*, pp. 26778. ISSN 17935091,
Kim H and Park H (2007). "Sparse nonnegative matrix
factorizations via alternating nonnegativityconstrained
least squares for microarray data analysis."
_Bioinformatics (Oxford, England)_, *23*(12), pp.
1495502. ISSN 14602059,
Van Benthem M and Keenan MR (2004). "Fast algorithm for
the solution of largescale nonnegativityconstrained
least squares problems." _Journal of Chemometrics_,
*18*(10), pp. 441450. ISSN 08869383,
See Also
Examples
# Only basic calls are presented in this manpage.
# Many more examples are provided in the demo file nmf.R
## Not run:
# demo('nmf')
# ## End(Not run)
# random data
x < rmatrix(20,10)
# run default algorithm with rank 2
res < nmf(x, 2)
# specify the algorithm
res < nmf(x, 2, 'lee')
# get verbose message on what is going on
res < nmf(x, 2, .options='v')
## Not run:
# # more messages
# res < nmf(x, 2, .options='v2')
# # even more
# res < nmf(x, 2, .options='v3')
# # and so on ...
# ## End(Not run)