Learn R Programming

⚠️There's a newer version (2.3.1) of this package.Take me there.

Sigminer: Mutational Signature Analysis and Visualization in R

Overview

The cancer genome is shaped by various mutational processes over its lifetime, stemming from exogenous and cell-intrinsic DNA damage, and error-prone DNA replication, leaving behind characteristic mutational spectra, termed mutational signatures. This package, sigminer, helps users to extract, analyze and visualize signatures from genome alteration records, thus providing new insight into cancer study.

For pipeline tool, please see its co-evolutionary CLI sigflow.

SBS signatures:

Copy number signatures:

DBS signatures:

INDEL (i.e. ID) signatures:

Genome rearrangement signatures:

Features

  • supports a standard de novo pipeline for identification of 5 types of signatures: copy number, SBS, DBS, INDEL and RS (genome rearrangement signature).
  • supports quantify exposure for one sample based on known signatures.
  • supports two methods for calling copy number signatures: one is from Macintyre et al. 2018 and the other is created by our group.
  • supports association and group analysis and visualization for signatures.
  • supports two types of signature exposures: relative exposure (relative contribution of signatures in each sample) and absolute exposure (estimated variation records of signatures in each sample).
  • supports basic summary and visualization for profile of mutation (powered by maftools) and copy number.
  • supports parallel computation by R packages foreach, future and NMF.
  • efficient code powered by R packages data.table and tidyverse.
  • elegant plots powered by R packages ggplot2, ggpubr, cowplot and patchwork.
  • well tested by R package testthat and documented by R package roxygen2, roxytest, pkgdown, and etc. for both reliable and reproducible research.

Key Interfaces and Their Performances

Sigminer provides many approaches to extract mutational signatures. To test their performances, I use 4 mutation catalog datasets (each mutation catalog dataset is composed of 30 samples, 10 COSMIC v2 (SBS) signatures are randomly assigned to each sample with random signature exposure) from reference #6. The following table shows how many signatures can be recovered and the corresponding average cosine similarity to COSMIC reference signatures for each approach with settings.

ApproachSelection WaySettingCallerRecommendDriverSet1Set2Set3Set4Success /MeanRun timeNote
Standard NMFManualDefault. 50 runs (estimation) + 100 runs (extraction)sig_estimate, sig_extractYES ⭐⭐⭐R10 (0.884)10 (0.944)9 or 10 (0.998)10 (0.994)~90%/0.955~1min (8 cores)This is a basic method, suitable for good mutation data with enough mutations.
SigProfilerManual/AutomaticDefault. 100 runssigprofiler_extractYES ⭐⭐⭐⭐Python/Anaconda10 (0.961)10 (0.999)10 (0.990)10 (0.997)100%/0.987~1h (8 cores)A golden standard like approach in this field, but longer run time, and the requirement for Python environment and extra large packages reduce its popularity here.
Best PracticeManual/AutomaticUse bootstrapped catalog (1000 runs)bp_extract_signaturesYES ⭐⭐⭐⭐⭐R10 (0.973)10 (0.990)10 (0.992)10 (0.971)100%/0.981~10min (8 cores)My R implementation for methods from reference #5 and #6. Should be the best option here. (Pay attention to the suggested solution)
Best PracticeManual/AutomaticUse original catalog (1000 runs)bp_extract_signaturesNO :star:R10 (0.987)9 (0.985)10 (0.997)9 (0.987)50%/0.989~10min (8 cores)This is created to compare with the approach with bootstrapped catalogs above and the standard NMF way.
Bayesian NMFAutomaticL1KL+optimal (20 runs)sig_auto_extractYES ⭐⭐⭐R10 (0.994)9 (0.997)9 (0.998)9 (0.999)25%/0.997~10min (8 cores)The Bayesian NMF approach auto reduce the signature number to a proper value from a initial signature number, here is 20.
Bayesian NMFAutomaticL1KL+stable (20 runs)sig_auto_extractYES ⭐⭐⭐⭐R10 (0.994)9 (0.997)10 (0.988)9 (0.999)50%/0.995~10min (8 cores)See above.
Bayesian NMFAutomaticL2KL+optimal (20 runs)sig_auto_extractNO :star:R12 (0.990)13 (0.988)12 (0.902)12 (0.994)0%/0.969~10min (8 cores)See above.
Bayesian NMFAutomaticL2KL+stable (20 runs)sig_auto_extractNO :star:R12 (0.990)12 (0.988)12 (0.902)12 (0.994)0%/0.969~10min (8 cores)See above.
Bayesian NMFAutomaticL1WL2H+optimal (20 runs)sig_auto_extractYES ⭐⭐⭐R9 (0.989)9 (0.999)9 (0.996)9 (1.000)0%/0.996~10min (8 cores)See above.
Bayesian NMFAutomaticL1WL2H+stable (20 runs)sig_auto_extractYES ⭐⭐⭐⭐R9 (0.989)9 (0.999)9 (0.996)9 (1.000)0%/0.996~10min (8 cores)See above.

NOTE: although Bayesian NMF approach with L1KL or L1WL2H prior cannot recover all 10 signatures here, but it is close to the true answer from initial signature number 20 in a automatic way, and the result signatures are highly similar to reference signatures. This also reminds us that we could not use this method to find signatures with small contributions in tumors.

Installation

You can install the stable release of sigminer from CRAN with:

install.packages("sigminer", dependencies = TRUE)
# Or
BiocManager::install("sigminer", dependencies = TRUE)

You can install the development version of sigminer from Github with:

remotes::install_github("ShixiangWang/sigminer", dependencies = TRUE)
# For Chinese users, run 
remotes::install_git("https://gitee.com/ShixiangWang/sigminer", dependencies = TRUE)

You can also install sigminer from conda bioconda channel with

# Please note version number of the bioconda release

# You can install an individual environment firstly with
# conda create -n sigminer
# conda activate sigminer
conda install -c bioconda -c conda-forge r-sigminer

Usage

A complete documentation of sigminer can be read online at https://shixiangwang.github.io/sigminer-doc/ (For Chinese users, you can also read it at https://shixiangwang.gitee.io/sigminer-doc/). All functions are well organized and documented at https://shixiangwang.github.io/sigminer/reference/index.html (For Chinese users, you can also read it at https://shixiangwang.gitee.io/sigminer/reference/index.html). For usage of a specific function fun, run ?fun in your R console to see its documentation.

Citation

If you use sigminer in academic field, please cite one of the following papers.



Download Stats

References

Please properly cite the following references when you are using any corresponding features. The references are also listed in the function documentation. Very thanks to the works, sigminer cannot be created without the giants.

  1. Mayakonda, Anand, et al. “Maftools: efficient and comprehensive analysis of somatic variants in cancer.” Genome research 28.11 (2018): 1747-1756.
  2. Gaujoux, Renaud, and Cathal Seoighe. “A Flexible R Package for Nonnegative Matrix Factorization.”" BMC Bioinformatics 11, no. 1 (December 2010).
  3. H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
  4. Kim, Jaegil, et al. “Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors.” Nature genetics 48.6 (2016): 600.
  5. Alexandrov, Ludmil B., et al. “Deciphering signatures of mutational processes operative in human cancer.” Cell reports 3.1 (2013): 246-259.
  6. Degasperi, Andrea, et al. “A practical framework and online tool for mutational signature analyses show intertissue variation and driver dependencies.” Nature cancer 1.2 (2020): 249-263.
  7. Alexandrov, Ludmil B., et al. “The repertoire of mutational signatures in human cancer.” Nature 578.7793 (2020): 94-101.
  8. Macintyre, Geoff, et al. “Copy number signatures and mutational processes in ovarian carcinoma.” Nature genetics 50.9 (2018): 1262.
  9. Tan, Vincent YF, and Cédric Févotte. “Automatic relevance determination in nonnegative matrix factorization with the/spl beta/-divergence.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35.7 (2012): 1592-1605.
  10. Bergstrom EN, Huang MN, Mahto U, Barnes M, Stratton MR, Rozen SG, Alexandrov LB: SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genomics 2019, 20:685 https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6041-2

LICENSE

The software is made available for non commercial research purposes only under the MIT. However, notwithstanding any provision of the MIT License, the software currently may not be used for commercial purposes without explicit written permission after contacting Shixiang Wang wangshx@shanghaitech.edu.cn or Xue-Song Liu liuxs@shanghaitech.edu.cn.

MIT © 2019-Present Shixiang Wang, Xue-Song Liu

MIT © 2018 Geoffrey Macintyre

MIT © 2018 Anand Mayakonda


Cancer Biology Group @ShanghaiTech

Research group led by Xue-Song Liu in ShanghaiTech University

Copy Link

Version

Install

install.packages('sigminer')

Monthly Downloads

494

Version

1.2.5

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Shixiang Wang

Last Published

February 20th, 2021

Functions in sigminer (1.2.5)

MAF-class

Class MAF
centromeres.hg19

Location of Centromeres at Genome Build hg19
chromsize.hg19

Chromosome Size of Genome Build hg19
CopyNumber-class

Class CopyNumber
centromeres.mm10

Location of Centromeres at Genome Build mm10
add_labels

Add Text Labels to a ggplot
centromeres.hg38

Location of Centromeres at Genome Build hg38
add_h_arrow

Add Horizontal Arrow with Text Label to a ggplot
CN.features

Classification Table of Copy Number Features Devised by Wang et al. for Method 'W'
enrich_component_strand_bias

Performs Strand Bias Enrichment Analysis for a Given Sample-by-Component Matrix
cytobands.hg38

Location of Chromosome Cytobands at Genome Build hg38
get_sig_exposure

Get Signature Exposure from 'Signature' Object
get_cn_freq_table

Get CNV Frequency Table
get_sig_db

Get Curated Reference Signature Database
get_bayesian_result

Get Specified Bayesian NMF Result from Run
get_sig_cancer_type_index

Obtain Signature Index for Cancer Types
cosine

Calculate Cosine Measures
get_shannon_diversity_index

Get Shannon Diversity Index for Signatures
cytobands.mm10

Location of Chromosome Cytobands at Genome Build mm10
cytobands.hg19

Location of Chromosome Cytobands at Genome Build hg19
bp

A Best Practice for Signature Extraction and Exposure (Activity) Attribution
chromsize.hg38

Chromosome Size of Genome Build hg38
get_tidy_association

Get Tidy Signature Association Results
read_sv_as_rs

Read Structural Variation Data as RS object
get_sig_similarity

Calculate Similarity between Identified Signatures and Reference Signatures
read_vcf

Read VCF Files as MAF Object
handle_hyper_mutation

Handle Hypermutant Samples
same_size_clustering

Same Size Clustering
get_adj_p

Get Adjust P Values from Group Comparison
hello

Say Hello to Users
show_cn_freq_circos

Show Copy Number Variation Frequency Profile with Circos
show_cn_features

Show Copy Number Feature Distributions
show_sig_exposure

Plot Signature Exposure
get_sig_feature_association

Calculate Association between Signature Exposures and Other Features
output_sig

Output Signature Results
get_sig_rec_similarity

Get Reconstructed Profile Cosine Similarity
get_cn_ploidy

Get Ploidy from Absolute Copy Number Profile
output_bootstrap

Output Signature Bootstrap Fitting Results
output_fit

Output Signature Fitting Results
sig_estimate

Estimate Signature Number
show_sig_feature_corrplot

Draw Corrplot for Signature Exposures and Other Features
sig_extract

Extract Signatures through NMF
get_genome_annotation

Get Genome Annotation
%>%

Pipe operator
read_copynumber

Read Absolute Copy Number Profile
scoring

Score Copy Number Profile
show_cn_group_profile

Show Summary Copy Number Profile for Sample Groups
get_tidy_parameter

Get Tidy Parameter from Flexmix Model
show_group_mapping

Map Groups using Sankey
show_sig_number_survey2

Show Comprehensive Signature Number Survey
show_groups

Show Signature Contribution in Clusters
show_cn_profile

Show Sample Copy Number Profile
group_enrichment

General Group Enrichment Analysis
subset.CopyNumber

Subsetting CopyNumber object
simulation

Simulation Analysis
show_cn_circos

Show Copy Number Profile in Circos
show_catalogue

Show Alteration Catalogue Profile
tidyeval

Tidy eval helpers
read_copynumber_seqz

Read Absolute Copy Number Profile from Sequenza Result Directory
transcript.hg19

Merged Transcript Location at Genome Build hg19
sigprofiler

Extract Signatures with SigProfiler
simulated_catalogs

A List of Simulated SBS-96 Catalog Matrix
show_sig_profile

Show Signature Profile
output_tally

Output Tally Result in Barplots
transcript.mm10

Merged Transcript Location at Genome Build mm10
transcript.hg38

Merged Transcript Location at Genome Build hg38
sig_operation

Obtain or Modify Signature Information
sig_fit_bootstrap_batch

Exposure Instability Analysis of Signature Exposures with Bootstrapping
chromsize.mm10

Chromosome Size of Genome Build mm10
get_group_comparison

Get Comparison Result between Signature Groups
read_maf

Read MAF Files
get_groups

Get Sample Groups from Signature Decomposition Information
show_cor

A Simple and General Way for Association Analysis
show_cosmic

Show Signature Information in Web Browser
read_xena_variants

Read UCSC Xena Variant Format Data as MAF Object
report_bootstrap_p_value

Report P Values from bootstrap Results
show_group_distribution

Show Groupped Variable Distribution
show_group_enrichment

Show Group Enrichment Result
show_sig_profile_heatmap

Show Signature Profile with Heatmap
show_sig_profile_loop

Show Signature Profile with Loop Way
show_cn_components

Show Copy Number Components
sig_fit_bootstrap

Obtain Bootstrap Distribution of Signature Exposures of a Certain Tumor Sample
sig_fit

Fit Signature Exposures with Linear Combination Decomposition
show_cosmic_sig_profile

Plot Reference (Mainly COSMIC) Signature Profile
sig_auto_extract

Extract Signatures through the Automatic Relevance Determination Technique
show_group_comparison

Plot Group Comparison Result
transform_seg_table

Transform Copy Number Table
show_cn_distribution

Show Copy Number Distribution either by Length or Chromosome
use_color_style

Set Color Style for Plotting
show_sig_bootstrap

Show Signature Bootstrap Analysis Results
show_sig_consensusmap

Show Signature Consensus Map
show_sig_fit

Show Signature Fit Result
show_sig_number_survey

Show Simplified Signature Number Survey
sig_convert

Convert Signatures between different Genomic Distribution of Components
sigminer

sigminer: Extract, Analyze and Visualize Signatures for Genomic Variations
sig_tally

Tally a Genomic Alteration Object