sig_estimate: Estimate Signature Number

Description

Use NMF package to evaluate the optimal number of signatures. This is used along with sig_extract. Users should library(NMF) firstly. If NMF objects are returned, the result can be further visualized by NMF plot methods like NMF::consensusmap() and NMF::basismap().

Usage

sig_estimate(
  nmf_matrix,
  range = 2:5,
  nrun = 10,
  use_random = FALSE,
  method = "brunet",
  seed = 123456,
  cores = 1,
  keep_nmfObj = FALSE,
  save_plots = FALSE,
  plot_basename = file.path(tempdir(), "nmf"),
  what = "all",
  pConstant = NULL,
  verbose = FALSE
)

Arguments

nmf_matrix

a matrix used for NMF decomposition with rows indicate samples and columns indicate components.

range

a numeric vector containing the ranks of factorization to try. Note that duplicates are removed and values are sorted in increasing order. The results are notably returned in this order.

nrun

a numeric giving the number of run to perform for each value in range, nrun set to 30~50 is enough to achieve robust result.

use_random

Should generate random data from input to test measurements. Default is TRUE.

method

specification of the NMF algorithm. Use 'brunet' as default. Available methods for nmf decompositions are 'brunet', 'lee', 'ls-nmf', 'nsNMF', 'offset'.

seed

specification of the starting point or seeding method, which will compute a starting point, usually using data from the target matrix in order to provide a good guess.

cores

number of cpu cores to run NMF.

keep_nmfObj

default is FALSE, if TRUE, keep NMF objects from runs, and the result may be huge.

save_plots

if TRUE, save signature number survey plot to local machine.

plot_basename

when save plots, set custom basename for file path.

what

a character vector whose elements partially match one of the following item, which correspond to the measures computed by summary on each multi-run NMF result: <U+2018>all<U+2019>, <U+2018>cophenetic<U+2019>, <U+2018>rss<U+2019>, <U+2018>residuals<U+2019>, <U+2018>dispersion<U+2019>, <U+2018>evar<U+2019>, <U+2018>silhouette<U+2019> (and more specific .coef, .basis, .consensus), <U+2018>sparseness<U+2019> (and more specific .coef, .basis). It specifies which measure must be plotted (what='all' plots all the measures).

pConstant

A small positive value (like 1e-9) to add to the matrix. Use it ONLY if the functions throws an non-conformable arrays error.

verbose

if TRUE, print extra message.

Value

a list contains information of NMF run and rank survey.

Details

The most common approach is to choose the smallest rank for which cophenetic correlation coefficient starts decreasing (Used by this function). Another approach is to choose the rank for which the plot of the residual sum of squares (RSS) between the input matrix and its estimate shows an inflection point. More custom features please directly use NMF::nmfEstimateRank.

References

Gaujoux, Renaud, and Cathal Seoighe. "A flexible R package for nonnegative matrix factorization." BMC bioinformatics 11.1 (2010): 367.

Examples

Run this code

# NOT RUN {
load(system.file("extdata", "toy_copynumber_tally_M.RData",
  package = "sigminer", mustWork = TRUE
))
# }
# NOT RUN {
library(NMF)
cn_estimate <- sig_estimate(cn_tally_M$nmf_matrix,
  cores = 1, nrun = 5,
  verbose = TRUE
)
# }

Run the code above in your browser using DataLab