SBSs are single base substitutions, e.g. C>T, A>G,.... DBSs are double base substitutions, e.g. CC>TT, AT>GG, ... Variants involving > 2 consecutive bases are rare, so this function just records them. These would be variants such ATG>CCT, AGAT>TCTA, ...
SplitStrelkaSBSVCF(
vcf.df,
max.vaf.diff = 0.02,
name.of.VCF = NULL,
always.merge.SBS = FALSE
)
A list of in-memory objects with the elements:
SBS.vcf
: Data frame of pure SBS mutations -- no DBS or 3+BS
mutations.
DBS.vcf
: Data frame of pure DBS mutations -- no SBS or 3+BS
mutations.
discarded.variants
: Non-NULL only if there are
variants that were excluded from the analysis. See the added extra column
discarded.reason
for more details.
An in-memory data frame containing a Strelka VCF file contents.
The maximum difference of VAF, default value is 0.02. If
the absolute difference of VAFs for adjacent SBSs is bigger than
max.vaf.diff
, then these adjacent SBSs are likely to be "merely"
asynchronous single base mutations, opposed to a simultaneous doublet
mutation or variants involving more than two consecutive bases. Use negative
value (e.g. -1) to suppress merging adjacent SBSs to DBS.
Name of the VCF file.
If TRUE
merge adjacent SBSs as DBSs
regardless of VAFs and regardless of the value of max.vaf.diff
.