Learn R Programming

ICAMS

In-depth Characterization and Analysis of Mutational Signatures (‘ICAMS’)

Purpose

Analysis and visualization of experimentally elucidated mutational signatures – the kind of analysis and visualization in Boot et al., “In-depth characterization of the cisplatin mutational signature in human cell lines and in esophageal and liver tumors”, Genome Research 2018, https://doi.org/10.1101/gr.230219.117 and “Characterization of colibactin-associated mutational signature in an Asian oral squamous cell carcinoma and in other mucosal tumor types”, Genome Research 2020 https://doi.org/10.1101/gr.255620.119. ‘ICAMS’ stands for In-depth Characterization and Analysis of Mutational Signatures. ‘ICAMS’ has functions to read in variant call files (VCFs) and to collate the corresponding catalogs of mutational spectra and to analyze and plot catalogs of mutational spectra and signatures. Handles both “counts-based” and “density-based” (i.e. representation as mutations per megabase) mutational spectra or signatures.

Installation

Get the stable version

IMPORTANT Install the Bioconductor dependencies first:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("BSgenome")

This may be slow; please be patient.

Afterwards, install the stable version of ICAMS from CRAN with the R command line:

install.packages("ICAMS")

Get the development version

To use new features in the development version, you can install ICAMS from the master branch on GitHub, which may not be stable:

if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github(repo = "steverozen/ICAMS", ref = "master")

Alternatively, you can download the package source of recent stable development version of ICAMS to your computer, then do:

if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_local(path = "path-to-package-source-file-on-your-computer")

Reference manual

https://github.com/steverozen/ICAMS/blob/v2.3.12-branch-cran/data-raw/ICAMS_2.3.12.pdf

Frequently asked questions

How to do normalization for “counts-based” catalogs of mutational spectra or signatures to account for differing abundances of the source sequence of the mutations?

You can use exported function TransformCatalog in ICAMS to normalize the data. Please refer to the documentation and example of TransformCatalog for more details.

Citing ICAMS

If you use ICAMS in your work, please cite:

Rozen SG, Jiang NH, Boot A, Liu M, Wu Y (2024). ICAMS:In-depth Characterization and Analysis of Mutational Signatures. R package version 2.3.12, https://CRAN.R-project.org/package=ICAMS.

Copy Link

Version

Install

install.packages('ICAMS')

Monthly Downloads

284

Version

2.3.12

License

GPL-3 | file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Steve Rozen

Last Published

February 9th, 2024

Functions in ICAMS (2.3.12)

CheckAndFixChrNames

Check and, if possible, correct the chromosome names in a VCF data.frame.
CreateOnePPMFromSBSVCF

Create position probability matrix (PPM) for *one* sample from a Variant Call Format (VCF) file.
CheckAndFixChrNamesForTransRanges

Check and, if possible, correct the chromosome names in a trans.ranges data.table
CheckAndReturnIDMatrix

Check and return the ID mutation matrix
CheckAndReturnIDCatalog

Check and return ID catalog
CreateOneColIDMatrix

Create one column of the matrix for an indel catalog from *one* in-memory VCF.
CheckDBSClassInVCF

Check DBS mutation class in VCF with the corresponding DBS mutation matrix
CheckSBSClassInVCF

Check SBS mutation class in VCF with the corresponding SBS mutation matrix
CheckAndReorderRownames

Check whether the rownames of object are correct, if yes then put the rows in the correct order.
FindDelMH

Return the length of microhomology at a deletion
FindMaxRepeatDel

Return the number of repeat units in which a deletion is embedded
GetVAF

Extract the VAFs (variant allele frequencies) and read depth information from a VCF file
ICAMS

ICAMS: In-depth Characterization and Analysis of Mutational Signatures
CheckAndReturnDBSCatalogs

Check and return DBS catalogs
ConvertICAMSCatalogToSigProSBS96

Covert an ICAMS SBS96 Catalog to SigProfiler format
CheckAndReturnDBSMatrix

Check and return the DBS mutation matrix
CreateOneColDBSMatrix

Create the matrix a DBS catalog for *one* sample from an in-memory VCF.
CreateOneColSBSMatrix

Create the matrix an SBS catalog for *one* sample from an in-memory VCF.
GenerateEmptyKmerCounts

Generate an empty matrix of k-mer abundance
CreateExomeStrandedRanges

Create exome transcriptionally stranded regions
GenerateKmer

Generate all possible k-mers of length k.
MakeVCFDBSdf

MakeVCFDBSdf Take DBS ranges and the original VCF and generate a VCF with dinucleotide REF and ALT alleles.
GetGenomeKmerCounts

Generate k-mer abundance from a given genome
GetMutationLoadsFromMutectVCFs

Get mutation loads information from Mutect VCF files.
MutectVCFFilesToCatalog

Create SBS, DBS and Indel catalogs from Mutect VCF files
PlotTransBiasGeneExp

Plot transcription strand bias with respect to gene expression values
CollapseCatalog

"Collapse" a catalog
CreatePentanucAbundance

Create pentanucleotide abundance
PlotPPM

Plot position probability matrix (PPM) for *one* sample from a Variant Call Format (VCF) file.
CreateTrinucAbundance

Create trinucleotide abundance
CreateDinucAbundance

Create dinucleotide abundance
PlotPPMToPdf

Plot position probability matrices (PPM) to a PDF file
MakeDataFrameFromVCF

Read in the data lines of a Variant Call Format (VCF) file
CheckSeqContextInVCF

Check that the sequence context information is consistent with the value of the column REF.
IsGRCm38

Test if object is BSgenome.Mmusculus.UCSC.mm10.
CreateTransRanges

Create a transcript range file from the raw GFF3 File
CreatePPMFromSBSVCFs

Create position probability matrices (PPM) from a list of SBS vcfs
ReadCatalogInternal

Internal read catalog function to be wrapped in a tryCatch
PlotTransBiasGeneExpToPdf

Plot transcription strand bias with respect to gene expression values to a PDF file
ReadDukeNUSCat192

Read a 192-channel spectra (or signature) catalog in Duke-NUS format
RemoveRangesOnBothStrand

Remove ranges that fall on both strands
CreateStrandedTrinucAbundance

Create stranded trinucleotide abundance
GetMutationLoadsFromStrelkaIDVCFs

Get mutation loads information from Strelka ID VCF files.
StrelkaIDVCFFilesToCatalogAndPlotToPdf

Create ID (small insertion and deletion) catalog from Strelka ID VCF files and plot them to PDF
StrelkaIDVCFFilesToCatalog

Create ID (small insertion and deletion) catalog from Strelka ID VCF files
ReadStrelkaSBSVCFs

Read Strelka SBS (single base substitutions) VCF files.
InferRownames

Infer the correct rownames for a matrix based on its number of rows
GetSequenceKmerCounts

Generate k-mer abundance from given nucleotide sequences
ReadMutectVCFs

Read Mutect VCF files.
ReadTranscriptRanges

Read transcript ranges and strand information from a gff3 format file. Use this one for the new, cut down gff3 file (2018 11 24)
MutectVCFFilesToCatalogAndPlotToPdf

Create SBS, DBS and Indel catalogs from Mutect VCF files and plot them to PDF
MutectVCFFilesToZipFile

Create a zip file which contains catalogs and plot PDFs from Mutect VCF files
InferCatalogInfo

This function converts an data.table imported from external catalog text file into ICAMS internal catalog object of appropriate type.
CreateStrandedDinucAbundance

Create stranded dinucleotide abundance
ReadMutectVCF

Read in the data lines of a Variant Call Format (VCF) file created by Mutect
RenameColumnsWithNameStrand

Is there any column in df with name "strand"? If there is, change its name to "strand_old" so that it will conflict with code in other parts of ICAMS package.
CreateTetranucAbundance

Create tetranucleotide abundance
GetStrandedKmerCounts

Generate stranded k-mer abundance from a given genome and gene annotation file
GetMutationLoadsFromStrelkaSBSVCFs

Get mutation loads information from Strelka SBS VCF files.
ReadStrelkaIDVCFs

Read Strelka ID (small insertion and deletion) VCF files
ReadStrelkaSBSVCF

Read in the data lines of an SBS VCF created by Strelka version 1
InferCatalogClassPrefix

These two functions is applicable only for internal ICAMS-formatted catalog object.
all.abundance

K-mer abundances
WriteCatalogIndelSigPro

Write Indel Catalogs in SigProExtractor format
InferAbundance

Infer abundance given a matrix-like object and additional information.
VCFsToCatalogsAndPlotToPdf

Create SBS, DBS and Indel catalogs from VCFs and plot them to PDF
ReadCatalog

Read catalog
SplitStrelkaSBSVCF

Split an in-memory Strelka VCF into SBS, DBS, and variants involving > 2 consecutive bases
TestMakeCatalogFromStrelkaSBSVCFs

This function is to make catalogs from the sample Strelka SBS VCF files to compare with the expected catalog information.
SplitOneVCF

Split a VCF into SBS, DBS, and ID VCFs, plus a list of other mutations
SplitSBSVCF

Split an in-memory SBS VCF into pure SBSs, pure DBSs, and variants involving > 2 consecutive bases
TestPlotCatCOMPOSITE

Plot the a SignatureAnalyzer COMPOSITE signature or catalog into separate pdfs
VCFsToDBSCatalogs

Create DBS catalogs from VCFs
Plot96PartOfCompositeToPDF

Plot the SBS96 part of a SignatureAnalyzer COMPOSITE signature or catalog
as.catalog

Create a catalog from a matrix, data.frame, or vector
ReadCatalogErrReturn

Get error message and either stop or create a null error output for read catalog
NormalizeGenomeArg

Take strings representing a genome and return the BSgenome object.
RenameColumnsWithNameVAF

Is there any column in df1 with name "VAF"? If there is, change its name to "VAF_old" so that it will conflict with code in other parts of ICAMS package.
SplitOneMutectVCF

Split a mutect2 VCF into SBS, DBS, and ID VCFs, plus a list of other mutations
revc

Reverse complement every string in string.vec
Restaple1536

Convert 1536-channel mutation-type identifiers like this "ACCGTA" -> "AC[C>A]GT"
RevcSBS96

Reverse complement strings that represent stranded SBSs
StopIfCatalogTypeIllegal

Stop if catalog.type is illegal.
TransformCatalog

Transform between counts and density spectrum catalogs and counts and density signature catalogs
WriteCat

Write a catalog to a file.
SplitListOfMutectVCFs

Split each Mutect VCF into SBS, DBS, and ID VCFs (plus VCF-like data frame with left-over rows)
TCFromCouSigCou

Source catalog type is counts or counts.signature
TCFromDenSigDen

density -> <anything> density.signature -> density.signature, counts.signature
TranscriptRanges

Transcript ranges data
WriteCatalog

Write a catalog
FindMaxRepeatIns

Return the number of repeat units in which an insertion is embedded.
GeneExpressionData

Example gene expression data from two cell lines
TestMakeCatalogFromMutectVCFs

This function makes catalogs from the sample Mutect VCF file and compares it with the expected catalog information.
VCFsToIDCatalogs

Create ID (small insertion and deletion) catalog from ID VCFs
VCFsToSBSCatalogs

Create SBS catalogs from SBS VCFs
StopIfNrowIllegal

Stop if the number of rows in object is illegal
Unstaple96

Convert SBS96-channel mutations-type identifiers like this "A[C>A]T" -> "ACTA"
TestMakeCatalogFromStrelkaIDVCFs

This function is to make catalogs from the sample Strelka ID VCF files to compare with the expected catalog information.
VCFsToCatalogs

Create SBS, DBS and Indel catalogs from VCFs
ReadAndSplitVCFs

Read and split VCF files
ReadStapleGT96SBS

Read a 96-channel spectra (or signature) catalog where rownames are e.g. "A[C>A]T"
StrelkaSBSVCFFilesToCatalogAndPlotToPdf

Create SBS and DBS catalogs from Strelka SBS VCF files and plot them to PDF
StrelkaSBSVCFFilesToZipFile

Create a zip file which contains catalogs and plot PDFs from Strelka SBS VCF files
IsGRCh37

Test if object is BSgenome.Hsapiens.1000genome.hs37d5.
ReadBedRanges

Read chromosome and position information from a bed format file.
RevcDBS144

Reverse complement strings that represent stranded DBSs
StandardChromNameNew

Standardize the chromosome name annotations for a data frame.
Restaple96

Convert 96-channel mutation-type identifiers like this "ACTA" -> "A[C>A]T"
StandardChromName

Standardize the chromosome name annotations for a data frame.
ReadStrelkaIDVCF

Read in the data lines of an ID VCF created by Strelka version 1
TransRownames.ID.SigPro.PCAWG

For indels, convert SigProfiler rownames into ICAMS/PCAWG7 rownames
GetCustomKmerCounts

Generate custom k-mer abundance from a given reference genome
GetConsensusVAF

Analogous to GetMutectVAF, calculating VAF and read depth from PCAWG7 consensus vcfs
IsGRCh38

Test if object is BSgenome.Hsapiens.UCSC.hg38.
TransRownames.ID.PCAWG.SigPro

For indels, convert ICAMS/PCAWG7 rownames into SigProfiler rownames
PlotCatalogToPdf

Plot catalog to a PDF file
PlotCatalog

Plot one spectrum or signature
VCFsToZipFile

Create a zip file which contains catalogs and plot PDFs from VCFs
VCFsToZipFileXtra

Analogous to VCFsToZipFile, also generates density CSV and PDF files in the zip archive.
ReadAndSplitStrelkaSBSVCFs

Read and split Strelka SBS VCF files
ReadAndSplitMutectVCFs

Read and split Mutect VCF files
ReadVCF

Read in the data lines of a Variant Call Format (VCF) file
SplitListOfStrelkaSBSVCFs

Split a list of in-memory Strelka SBS VCF into SBS, DBS, and variants involving > 2 consecutive bases
ReadVCFs

Read VCF files
SplitListOfVCFs

Split each VCF into SBS, DBS, and ID VCFs (plus VCF-like data frame with left-over rows)
StopIfRegionIllegal

Stop if region is illegal.
StrelkaIDVCFFilesToZipFile

Create a zip file which contains ID (small insertion and deletion) catalog and plot PDF from Strelka ID VCF files
StrelkaSBSVCFFilesToCatalog

Create SBS and DBS catalogs from Strelka SBS VCF files
StopIfTranscribedRegionIllegal

Stop if region is illegal for an in-transcript catalogs
Unstaple1536

Convert SBS1536-channel mutations-type identifiers like this "AC[C>A]GT" -> "ACCGTA"
Unstaple78

Convert DBS78-channel mutations-type identifiers like this "AC>GA" -> "ACGA"
AddSBSClass

Add SBS mutation class to an annotated SBS VCF
AnnotateIDVCF

Add sequence context to an in-memory ID (insertion/deletion) VCF, and confirm that they match the given reference genome
AnnotateDBSVCF

Add sequence context and transcript information to an in-memory DBS VCF
AnnotateSBSVCF

Add sequence context and transcript information to an in-memory SBS VCF
AddAndCheckSBSClassInVCF

Add and check SBS class in an annotated VCF with the corresponding SBS mutation matrix
AddTranscript

Add transcript information to a data frame with mutation records
AddSeqContext

Add sequence context to a data frame with mutation records
CanonicalizeID

Determine the mutation types of insertions and deletions.
CalBaseCountsFrom3MerAbundance

Calculate base counts from three mer abundance
CheckAndReturnSBSCatalogs

Check and return SBS catalogs
AddRunInformation

Create a run information text file from generating zip archive from VCF files.
AddAndCheckDBSClassInVCF

Add and check DBS class in an annotated VCF with the corresponding DBS mutation matrix
AddDBSClass

Add DBS mutation class to an annotated DBS VCF
Canonicalize1INS

Given an insertion and its sequence context, categorize it.
CatalogRowOrder

Standard order of row names in a catalog
CalculateNumberOfSpace

Calculate the number of space needed to add strand bias statistics to the run-information.txt file.
Canonicalize1Del

Given a deletion and its sequence context, categorize it
Canonicalize1ID

Given a single insertion or deletion in context categorize it.
CheckAndReturnSBSMatrix

Check and return the SBS mutation matrix