Read and split VCF files
ReadAndSplitVCFs(
files,
variant.caller = "unknown",
num.of.cores = 1,
names.of.VCFs = NULL,
tumor.col.names = NA,
filter.status = DefaultFilterStatus(variant.caller),
get.vaf.function = NULL,
...,
max.vaf.diff = 0.02,
suppress.discarded.variants.warnings = TRUE,
always.merge.SBS = FALSE,
chr.names.to.process = NULL
)A list containing the following objects:
SBS: List of VCFs with only single base substitutions.
DBS: List of VCFs with only doublet base substitutions.
ID: List of VCFs with only small insertions and deletions.
discarded.variants: Non-NULL only if there are variants
that were excluded from the analysis. See the added extra column
discarded.reason for more details.
Character vector of file paths to the VCF files.
Name of the variant caller that produces the VCF, can
be either "strelka", "mutect", "freebayes" or
"unknown". This information is needed to calculate the VAFs (variant
allele frequencies). If variant caller is "unknown"(default) and
get.vaf.function is NULL, then VAF and read depth will be NAs. If
variant caller is "mutect", do not merge SBSs into DBS.
The number of cores to use. Not available on Windows
unless num.of.cores = 1.
Optional. Character vector of names of the VCF files.
The order of names in names.of.VCFs should match the order of VCF
file paths in files. If NULL(default), this function will
remove all of the path up to and including the last path separator (if any)
in files and file paths without extensions (and the leading dot)
will be used as the names of the VCF files.
Optional. Only applicable to Mutect VCFs.
Vector of column names or column indices in Mutect VCFs which
contain the tumor sample information. The order of elements in
tumor.col.names should match the order of Mutect VCFs
specified in files. If tumor.col.names is equal to
NA(default), this function will use the 10th column in all the
Mutect VCFs to calculate VAFs. See GetMutectVAF for
more details.
The character string in column FILTER of the VCF
that indicates that a variant has passed all the variant caller's filters.
Variants (lines in the VCF) for which the value in column FILTER
does not equal filter.status are silently excluded from the output.
The internal function DefaultFilterStatus tries to infer
filter.status based on variant.caller. If
variant.caller is "unknown", user must specify filter.status
explicitly. If filter.status = NULL, all variants are retained. If
there is no FILTER column in the VCF, all variants are retained with
a warning.
Optional. Only applicable when variant.caller is
"unknown". Function to calculate VAF(variant allele frequency) and read
depth information from original VCF. See GetMutectVAF as an example.
If NULL(default) and variant.caller is "unknown", then VAF
and read depth will be NAs.
Optional arguments to get.vaf.function.
Not applicable if variant.caller =
"mutect". The maximum difference of VAF, default value is 0.02. If the
absolute difference of VAFs for adjacent SBSs is bigger than
max.vaf.diff, then these adjacent SBSs are likely to be "merely"
asynchronous single base mutations, opposed to a simultaneous doublet
mutation or variants involving more than two consecutive bases. Use
negative value (e.g. -1) to suppress merging adjacent SBSs to DBS.
Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE.
If TRUE merge adjacent SBSs as DBSs
regardless of VAFs and regardless of the value of max.vaf.diff
and regardless of the value of get.vaf.function. It is an
error to set this to TRUE when variant.caller = "mutect".
A character vector specifying the chromosome names in VCF whose variants will be kept and processed, other chromosome variants will be discarded. If NULL(default), all variants will be kept except those on chromosomes with names that contain strings "GL", "KI", "random", "Hs", "M", "JH", "fix", "alt".
VCFsToCatalogs
file <- c(system.file("extdata/Mutect-vcf",
"Mutect.GRCh37.s1.vcf",
package = "ICAMS"))
list.of.vcfs <- ReadAndSplitVCFs(file, variant.caller = "mutect")
Run the code above in your browser using DataLab