SBSs are single base substitutions, e.g. C>T, A>G,.... DBSs are double base substitutions, e.g. CC>TT, AT>GG, ... Variants involving > 2 consecutive bases are rare, so this function just records them. These would be variants such ATG>CCT, AGAT>TCTA, ...
SplitStrelkaSBSVCF(vcf.df, max.vaf.diff = 0.02)
An in-memory data frame containing a Strelka VCF file contents.
The maximum difference of VAF, default value is 0.02.
A list of in-memory objects with the elements:
SBS.vcf
: Data frame of pure SBS mutations -- no DBS or 3+BS
mutations.
DBS.vcf
: Data frame of pure DBS mutations -- no SBS or 3+BS
mutations.
ThreePlus
: Data table with the key CHROM, LOW.POS, HIGH.POS
and additional information (reference sequence, alternative sequence,
context, etc.) Additional information not fully implemented at this point
because of limited immediate biological interest.
multiple.alt
Rows with multiple alternate alleles (removed
from SBS.vcf
etc.)