Learn R Programming

ICAMS (version 2.2.4)

SplitStrelkaSBSVCF: Split an in-memory Strelka VCF into SBS, DBS, and variants involving > 2 consecutive bases

Description

SBSs are single base substitutions, e.g. C>T, A>G,.... DBSs are double base substitutions, e.g. CC>TT, AT>GG, ... Variants involving > 2 consecutive bases are rare, so this function just records them. These would be variants such ATG>CCT, AGAT>TCTA, ...

Usage

SplitStrelkaSBSVCF(vcf.df, max.vaf.diff = 0.02, name.of.VCF = NULL)

Arguments

vcf.df

An in-memory data frame containing a Strelka VCF file contents.

max.vaf.diff

The maximum difference of VAF, default value is 0.02.

name.of.VCF

Name of the VCF file.

Value

A list of in-memory objects with the elements:

  1. SBS.vcf: Data frame of pure SBS mutations -- no DBS or 3+BS mutations.

  2. DBS.vcf: Data frame of pure DBS mutations -- no SBS or 3+BS mutations.

  3. discarded.variants: Non-NULL only if there are variants that were excluded from the analysis. See the added extra column discarded.reason for more details.