Read and split VCF files
ReadAndSplitVCFs(
files,
variant.caller = "unknown",
num.of.cores = 1,
names.of.VCFs = NULL,
tumor.col.names = NA,
filter.status = NULL,
get.vaf.function = NULL,
...,
max.vaf.diff = 0.02,
suppress.discarded.variants.warnings = TRUE
)
A list containing the following objects:
SBS
: List of VCFs with only single base substitutions.
DBS
: List of VCFs with only doublet base substitutions.
ID
: List of VCFs with only small insertions and deletions.
discarded.variants
: Non-NULL only if there are variants
that were excluded from the analysis. See the added extra column
discarded.reason
for more details.
Character vector of file paths to the VCF files.
Name of the variant caller that produces the VCF, can
be either "strelka"
, "mutect"
, "freebayes"
or
"unknown"
. This information is needed to calculate the VAFs (variant
allele frequencies). If variant caller is "unknown"
(default) and
get.vaf.function
is NULL, then VAF and read depth will be NAs. If
variant caller is "mutect"
, do not merge SBSs into DBS.
The number of cores to use. Not available on Windows
unless num.of.cores = 1
.
Character vector of names of the VCF files. The order
of names in names.of.VCFs
should match the order of VCF file paths
in files
. If NULL
(default), this function will remove all of
the path up to and including the last path separator (if any) and file
paths without extensions (and the leading dot) will be used as the names of
the VCF files.
Optional. Only applicable to Mutect VCFs.
Character vector of column names in Mutect VCFs which contain the
tumor sample information. The order of names in tumor.col.names
should match the order of Mutect VCFs specified in files
.
If tumor.col.names
is equal to NA
(default), this function
will use the 10th column in all the Mutect VCFs to calculate VAFs.
See GetMutectVAF
for more details.
The status indicating a variant has passed all filters.
An example would be "PASS"
. Variants which don't have the specified
filter.status
in the FILTER
column in VCF will be removed. If
NULL
(default), no variants will be removed from the original VCF.
Optional. Only applicable when variant.caller
is
"unknown". Function to calculate VAF(variant allele frequency) and read
depth information from original VCF. See GetMutectVAF
as an example.
If NULL
(default) and variant.caller
is "unknown", then VAF
and read depth will be NAs.
Optional arguments to get.vaf.function
.
Not applicable if variant.caller =
"mutect"
. The maximum difference of VAF, default value is 0.02. If the
absolute difference of VAFs for adjacent SBSs is bigger than max.vaf.diff
,
then these adjacent SBSs are likely to be "merely" asynchronous single base
mutations, opposed to a simultaneous doublet mutation or variants involving
more than two consecutive bases.
Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE.
VCFsToCatalogs
file <- c(system.file("extdata/Mutect-vcf",
"Mutect.GRCh37.s1.vcf",
package = "ICAMS"))
list.of.vcfs <- ReadAndSplitVCFs(file, variant.caller = "mutect")
Run the code above in your browser using DataLab