ICAMS (version 2.0.7)

SplitStrelkaSBSVCF: Split an in-memory Strelka VCF into SBS, DBS, and variants involving > 2 consecutive bases

Description

SBSs are single base substitutions, e.g. C>T, A<G,.... DBSs are double base substitutions, e.g. CC>TT, AT>GG, ... Variants involving > 2 consecutive bases are rare, so this function just records them. These would be variants such ATG>CCT, AGAT > TCTA, ...

Usage

SplitStrelkaSBSVCF(vcf.df, max.vaf.diff = 0.02)

Arguments

vcf.df

An in-memory data frame containing a Strelka VCF file contents.

max.vaf.diff

The maximum difference of VAF, default value is 0.02.

Value

A list of 3 in-memory objects with the elements:

  1. SBS.vcf: Data frame of pure SBS mutations -- no DBS or 3+BS mutations

  2. DBS.vcf: Data frame of pure DBS mutations -- no SBS or 3+BS mutations

  3. ThreePlus: Data table with the key CHROM, LOW.POS, HIGH.POS and additional information (reference sequence, alternative sequence, context, etc.) Additional information not fully implemented at this point because of limited immediate biological interest.