ICAMS (version 2.3.12)

SplitStrelkaSBSVCF: Split an in-memory Strelka VCF into SBS, DBS, and variants involving > 2 consecutive bases

Description

SBSs are single base substitutions, e.g. C>T, A>G,.... DBSs are double base substitutions, e.g. CC>TT, AT>GG, ... Variants involving > 2 consecutive bases are rare, so this function just records them. These would be variants such ATG>CCT, AGAT>TCTA, ...

Usage

SplitStrelkaSBSVCF(vcf.df, max.vaf.diff = 0.02, name.of.VCF = NULL)

Value

A list of in-memory objects with the elements:

  1. SBS.vcf: Data frame of pure SBS mutations -- no DBS or 3+BS mutations.

  2. DBS.vcf: Data frame of pure DBS mutations -- no SBS or 3+BS mutations.

  3. discarded.variants: Non-NULL only if there are variants that were excluded from the analysis. See the added extra column discarded.reason for more details.

Arguments

vcf.df

An in-memory data frame containing a Strelka VCF file contents.

max.vaf.diff

The maximum difference of VAF, default value is 0.02. If the absolute difference of VAFs for adjacent SBSs is bigger than max.vaf.diff, then these adjacent SBSs are likely to be "merely" asynchronous single base mutations, opposed to a simultaneous doublet mutation or variants involving more than two consecutive bases.

name.of.VCF

Name of the VCF file.