SBSs are single base substitutions, e.g. C>T, A<G,.... DBSs are double base substitutions, e.g. CC>TT, AT>GG, ... Variants involving > 2 consecutive bases are rare, so this function just records them. These would be variants such ATG>CCT, AGAT > TCTA, ...
SplitStrelkaSBSVCF(vcf.df, max.vaf.diff = 0.02)
An in-memory data frame containing a Strelka VCF file contents.
The maximum difference of VAF, default value is 0.02.
A list of 3 in-memory objects with the elements:
SBS.vcf
: Data frame of pure SBS mutations -- no DBS or 3+BS mutations
DBS.vcf
: Data frame of pure DBS mutations -- no SBS or 3+BS mutations
ThreePlus: Data table with the key CHROM, LOW.POS, HIGH.POS and additional information (reference sequence, alternative sequence, context, etc.) Additional information not fully implemented at this point because of limited immediate biological interest.