A bayesian variant of NMF algorithm to enable optimal inferences for the number of signatures through the automatic relevance determination technique. This functions delevers highly interpretable and sparse representations for both signature profiles and attributions at a balance between data fitting and model complexity (this method may introduce more signatures than expected, especially for copy number signatures (thus I don't recommend you to use this feature to extract copy number signatures)). See detail part and references for more.
sig_auto_extract(
nmf_matrix = NULL,
result_prefix = "BayesNMF",
destdir = tempdir(),
method = c("L1W.L2H", "L1KL", "L2KL"),
strategy = c("stable", "optimal"),
K0 = 25,
nrun = 10,
niter = 2e+05,
tol = 1e-07,
cores = 1,
optimize = FALSE,
skip = FALSE,
recover = FALSE
)
a matrix
used for NMF decomposition with rows indicate samples and columns indicate components.
prefix for result data files.
path to save data runs, default is tempdir()
.
default is "L1W.L2H", which uses an exponential prior for W and a half-normal prior for H (This method is used by PCAWG project, see reference #3). You can also use "L1KL" to set expoential priors for both W and H, and "L2KL" to set half-normal priors for both W and H. The latter two methods are originally implemented by SignatureAnalyzer software.
the selection strategy for returned data. Set 'stable' for getting optimal result from the most frequent K. Set 'optimal' for getting optimal result from all Ks. If you want select other solution, please check get_bayesian_result.
number of initial signatures.
number of independent simulations.
the maximum number of iterations. Only used when method is "Macintyre".
tolerance for convergence.
number of cpu cores to run NMF.
logical, for exposure optimization, especially useful for copy number signature.
if TRUE
, it will skip running a previous stored result. This can be used to
extend run times, e.g. you try running 10 times firstly and then you want to extend it to
20 times.
if TRUE
, try to recover result from previous runs based on input result_prefix
,
destdir
and nrun
. This is pretty useful for reproducing result. Please use skip
if you want
to recover an unfinished job.
a list
with Signature
class.
There are three methods available in this function: "L1W.L2H", "L1KL" and "L2KL".
They use different priors for the bayesian variant of NMF algorithm
(see method
parameter) written by reference #1 and implemented in
SignatureAnalyzer software
(reference #2).
I copied source code for the three methods from Broad Institute and supplementary
files of reference #3, and wrote this higher function. It is more friendly for users
to extract, visualize and analyze signatures by combining with other powerful functions
in sigminer package. Besides, I implemented parallel computation to speed up
the calculation process and a similar input and output structure like sig_extract()
.
Tan, Vincent YF, and C<U+00E9>dric F<U+00E9>votte. "Automatic relevance determination in nonnegative matrix factorization with the/spl beta/-divergence." IEEE Transactions on Pattern Analysis and Machine Intelligence 35.7 (2012): 1592-1605.
Kim, Jaegil, et al. "Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors." Nature genetics 48.6 (2016): 600.
Alexandrov, Ludmil, et al. "The repertoire of mutational signatures in human cancer." BioRxiv (2018): 322859.
sig_tally for getting variation matrix, sig_extract for extracting signatures using NMF package, sig_estimate for estimating signature number for sig_extract.
# NOT RUN {
load(system.file("extdata", "toy_copynumber_tally_M.RData",
package = "sigminer", mustWork = TRUE
))
res <- sig_auto_extract(cn_tally_M$nmf_matrix, result_prefix = "Test_copynumber", nrun = 1)
# At default, all run files are stored in tempdir()
dir(tempdir(), pattern = "Test_copynumber")
# }
Run the code above in your browser using DataLab