Create 3 SBS catalogs (96, 192, 1536), 3 DBS catalogs (78, 136, 144) and
Indel catalog from the Mutect VCFs specified by files
VCFsToCatalogs(
files,
ref.genome,
variant.caller = "unknown",
num.of.cores = 1,
trans.ranges = NULL,
region = "unknown",
names.of.VCFs = NULL,
tumor.col.names = NA,
filter.status = DefaultFilterStatus(variant.caller),
get.vaf.function = NULL,
...,
max.vaf.diff = 0.02,
return.annotated.vcfs = FALSE,
suppress.discarded.variants.warnings = TRUE,
chr.names.to.process = NULL
)
A list containing the following objects:
catSBS96
, catSBS192
, catSBS1536
: Matrix of
3 SBS catalogs (one each for 96, 192, and 1536).
catDBS78
, catDBS136
, catDBS144
: Matrix of
3 DBS catalogs (one each for 78, 136, and 144).
catID
: Matrix of ID (small insertions and deletions) catalog.
discarded.variants
: Non-NULL only if there are variants
that were excluded from the analysis. See the added extra column
discarded.reason
for more details.
annotated.vcfs
:
Non-NULL only if return.annotated.vcfs
= TRUE.
A list of elements:
SBS
: SBS VCF annotated by AnnotateSBSVCF
with
three new columns SBS96.class
, SBS192.class
and
SBS1536.class
showing the mutation class for each SBS variant.
DBS
: DBS VCF annotated by AnnotateDBSVCF
with
three new columns DBS78.class
, DBS136.class
and
DBS144.class
showing the mutation class for each DBS variant.
ID
: ID VCF annotated by AnnotateIDVCF
with one
new column ID.class
showing the mutation class for each
ID variant.
If trans.ranges
is not provided by user and cannot be inferred by
ICAMS, SBS 192 and DBS 144 catalog will not be generated. Each catalog has
attributes added. See as.catalog
for more details.
Character vector of file paths to the VCF files.
A ref.genome
argument as described in
ICAMS
.
Name of the variant caller that produces the VCF, can
be either "strelka"
, "mutect"
, "freebayes"
or
"unknown"
. This information is needed to calculate the VAFs (variant
allele frequencies). If variant caller is "unknown"
(default) and
get.vaf.function
is NULL, then VAF and read depth will be NAs. If
variant caller is "mutect"
, do not merge SBSs into DBS.
The number of cores to use. Not available on Windows
unless num.of.cores = 1
.
Optional. If ref.genome
specifies one of the
BSgenome
object
BSgenome.Hsapiens.1000genomes.hs37d5
BSgenome.Hsapiens.UCSC.hg38
BSgenome.Mmusculus.UCSC.mm10
then the function will infer trans.ranges
automatically. Otherwise,
user will need to provide the necessary trans.ranges
. Please refer to
TranscriptRanges
for more details.
If is.null(trans.ranges)
do not add transcript range
information.
A character string designating a genomic region;
see as.catalog
and ICAMS
.
Optional. Character vector of names of the VCF files.
The order of names in names.of.VCFs
should match the order of VCF
file paths in files
. If NULL
(default), this function will
remove all of the path up to and including the last path separator (if any)
in files
and file paths without extensions (and the leading dot)
will be used as the names of the VCF files.
Optional. Only applicable to Mutect VCFs.
Vector of column names or column indices in Mutect VCFs which
contain the tumor sample information. The order of elements in
tumor.col.names
should match the order of Mutect VCFs
specified in files
. If tumor.col.names
is equal to
NA
(default), this function will use the 10th column in all the
Mutect VCFs to calculate VAFs. See GetMutectVAF
for
more details.
The character string in column FILTER
of the VCF
that indicates that a variant has passed all the variant caller's filters.
Variants (lines in the VCF) for which the value in column FILTER
does not equal filter.status
are silently excluded from the output.
The internal function DefaultFilterStatus
tries to infer
filter.status
based on variant.caller
. If
variant.caller
is "unknown", user must specify filter.status
explicitly. If filter.status = NULL
, all variants are retained. If
there is no FILTER
column in the VCF, all variants are retained with
a warning.
Optional. Only applicable when variant.caller
is
"unknown". Function to calculate VAF(variant allele frequency) and read
depth information from original VCF. See GetMutectVAF
as an example.
If NULL
(default) and variant.caller
is "unknown", then VAF
and read depth will be NAs.
Optional arguments to get.vaf.function
.
Not applicable if variant.caller =
"mutect"
. The maximum difference of VAF, default value is 0.02. If the
absolute difference of VAFs for adjacent SBSs is bigger than
max.vaf.diff
, then these adjacent SBSs are likely to be "merely"
asynchronous single base mutations, opposed to a simultaneous doublet
mutation or variants involving more than two consecutive bases. Use
negative value (e.g. -1) to suppress merging adjacent SBSs to DBS.
Logical. Whether to return the annotated VCFs with additional columns showing mutation class for each variant. Default is FALSE.
Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE.
A character vector specifying the chromosome names in VCF whose variants will be kept and processed, other chromosome variants will be discarded. If NULL(default), all variants will be kept except those on chromosomes with names that contain strings "GL", "KI", "random", "Hs", "M", "JH", "fix", "alt".
See https://github.com/steverozen/ICAMS/blob/v3.0.9-branch/data-raw/PCAWG7_indel_classification_2021_09_03.xlsx for additional information on ID (small insertions and deletions) mutation classification.
See the documentation for Canonicalize1Del
which first handles
deletions in homopolymers, then handles deletions in simple repeats with
longer repeat units, (e.g. CACACACA
, see
FindMaxRepeatDel
), and if the deletion is not in a simple
repeat, looks for microhomology (see FindDelMH
).
See the code for unexported function CanonicalizeID
and the functions it calls for handling of insertions.
To add or change attributes of the catalog, you can use function
attr
.
For example, attr(catalog, "abundance")
<- custom.abundance
.
This function calls VCFsToSBSCatalogs
,
VCFsToDBSCatalogs
and VCFsToIDCatalogs
file <- c(system.file("extdata/Mutect-vcf",
"Mutect.GRCh37.s1.vcf",
package = "ICAMS"))
if (requireNamespace("BSgenome.Hsapiens.1000genomes.hs37d5", quietly = TRUE)) {
catalogs <- VCFsToCatalogs(file, ref.genome = "hg19",
variant.caller = "mutect", region = "genome")}
Run the code above in your browser using DataLab