SBSs are single base substitutions, e.g. C>T, A<G,.... DBSs are double base substitutions, e.g. CC>TT, AT>GG, ... Variants involving > 2 consecutive bases are rare, so this function just records them. These would be variants such ATG>CCT, AGAT>TCTA, ...
SplitListOfStrelkaSBSVCFs(list.of.vcfs)
A list of in-memory data frames containing Strelka SBS VCF file contents.
A list of in-memory objects with the elements:
SBS.vcfs
: List of Data frames of pure SBS mutations -- no
DBS or 3+BS mutations.
DBS.vcfs
: List of Data frames of pure DBS mutations -- no
SBS or 3+BS mutations.
ThreePlus
: List of Data tables with the key CHROM, LOW.POS,
HIGH.POS and additional information (reference sequence, alternative
sequence, context, etc.) Additional information not fully implemented at
this point because of limited immediate biological interest.
multiple.alt
Rows with multiple alternate alleles (removed
from SBS.vcfs
etc.)