Split a VCF into SBS, DBS, and ID VCFs, plus a list of other mutations
SplitOneVCF(
vcf.df,
max.vaf.diff = 0.02,
name.of.VCF = NULL,
always.merge.SBS = FALSE,
chr.names.to.process = NULL
)
A list with 3 in-memory VCFs and discarded variants that were not incorporated into the first 3 VCFs:
* SBS
: VCF with only single base substitutions.
* DBS
: VCF with only doublet base substitutions.
* ID
: VCF with only small insertions and deletions.
* discarded.variants
: Non-NULL only if there are variants
that were excluded from the analysis. See the added extra column
discarded.reason
for more details.
@md
An in-memory data.frame representing a VCF, including
VAFs, which are added by ReadVCF
.
The maximum difference of VAF, default value is 0.02. If
the absolute difference of VAFs for adjacent SBSs is bigger than
max.vaf.diff
, then these adjacent SBSs are likely to be "merely"
asynchronous single base mutations, opposed to a simultaneous doublet
mutation or variants involving more than two consecutive bases. Use negative
value (e.g. -1) to suppress merging adjacent SBSs to DBS.
Name of the VCF file.
If TRUE
merge adjacent SBSs as DBSs
regardless of VAFs and regardless of the value of max.vaf.diff
.
A character vector specifying the chromosome
names in VCF whose variants will be kept and processed, other chromosome
variants will be discarded. If NULL
(default), all variants will be kept
except those on chromosomes with names that contain strings "GL", "KI",
"random", "Hs", "M", "JH", "fix", "alt".