SBSs are single base substitutions, e.g. C>T, A>G,.... DBSs are double base substitutions, e.g. CC>TT, AT>GG, ... Variants involving > 2 consecutive bases are rare, so this function just records them. These would be variants such ATG>CCT, AGAT>TCTA, ...
SplitSBSVCF(vcf.df, max.vaf.diff = 0.02, name.of.VCF = NULL)
A list of in-memory objects with the elements:
SBS.vcf
: Data frame of pure SBS mutations -- no DBS or 3+BS
mutations.
DBS.vcf
: Data frame of pure DBS mutations -- no SBS or 3+BS
mutations.
discarded.variants
: Non-NULL only if there are
variants that were excluded from the analysis. See the added extra column
discarded.reason
for more details.
An in-memory data frame containing an SBS VCF file contents.
The maximum difference of VAF, default value is 0.02. If
the absolute difference of VAFs for adjacent SBSs is bigger than
max.vaf.diff
, then these adjacent SBSs are likely to be "merely"
asynchronous single base mutations, opposed to a simultaneous doublet
mutation or variants involving more than two consecutive bases.
Name of the VCF file.