Create 3 SBS catalogs (96, 192, 1536), 3 DBS catalogs (78, 136, 144) and
Indel catalog from the Mutect VCFs specified by dir
, save the catalogs
as CSV files, plot them to PDF and generate a zip archive of all the output files.
MutectVCFFilesToZipFile(
dir,
zipfile,
ref.genome,
trans.ranges = NULL,
region = "unknown",
names.of.VCFs = NULL,
tumor.col.names = NA,
base.filename = "",
flag.mismatches = 0,
return.annotated.vcfs = FALSE,
suppress.discarded.variants.warnings = TRUE
)
A list containing the following objects:
catSBS96
, catSBS192
, catSBS1536
: Matrix of
3 SBS catalogs (one each for 96, 192, and 1536).
catDBS78
, catDBS136
, catDBS144
: Matrix of
3 DBS catalogs (one each for 78, 136, and 144).
catID
: Matrix of ID (small insertion and deletion) catalog.
discarded.variants
: Non-NULL only if there are variants
that were excluded from the analysis. See the added extra column
discarded.reason
for more details.
annotated.vcfs
:
Non-NULL only if return.annotated.vcfs
= TRUE.
A list of elements:
SBS
: SBS VCF annotated by AnnotateSBSVCF
with
three new columns SBS96.class
, SBS192.class
and
SBS1536.class
showing the mutation class for each SBS variant.
DBS
: DBS VCF annotated by AnnotateDBSVCF
with
three new columns DBS78.class
, DBS136.class
and
DBS144.class
showing the mutation class for each DBS variant.
ID
: ID VCF annotated by AnnotateIDVCF
with one
new column ID.class
showing the mutation class for each
ID variant.
If trans.ranges
is not provided by user and cannot be inferred by
ICAMS, SBS 192 and DBS 144 catalog will not be generated. Each catalog has
attributes added. See as.catalog
for more details.
Pathname of the directory which contains only the Mutect
VCF files. Each Mutect VCF must have a file extension ".vcf" (case
insensitive) and share the same ref.genome
and
region
.
Pathname of the zip file to be created.
A ref.genome
argument as described in
ICAMS
.
Optional. If ref.genome
specifies one of the
BSgenome
object
BSgenome.Hsapiens.1000genomes.hs37d5
BSgenome.Hsapiens.UCSC.hg38
BSgenome.Mmusculus.UCSC.mm10
then the function will infer trans.ranges
automatically. Otherwise,
user will need to provide the necessary trans.ranges
. Please refer to
TranscriptRanges
for more details.
If is.null(trans.ranges)
do not add transcript range
information.
A character string designating a genomic region;
see as.catalog
and ICAMS
.
Optional. Character vector of names of the VCF files.
The order of names in names.of.VCFs
should match the order of VCFs
listed in dir
. If NULL
(default), this function will remove
all of the path up to and including the last path separator (if any) in
dir
and file paths without extensions (and the leading dot) will be
used as the names of the VCF files.
Optional. Character vector of column names in VCFs which contain
the tumor sample information. The order of names in tumor.col.names
should match the order of VCFs listed in dir
. If
tumor.col.names
is equal to NA
(default), this function will
use the 10th column in all the VCFs to calculate VAFs.
See GetMutectVAF
for more details.
Optional. The base name of the CSV and PDF files to be
produced; multiple files will be generated, each ending in
\(x\).csv
or \(x\).pdf
, where \(x\) indicates the type
of catalog.
Deprecated. If there are ID variants whose REF
do not match the extracted sequence from ref.genome
, the function
will automatically discard these variants and an element
discarded.variants
will appear in the return value. See
AnnotateIDVCF
for more details.
Logical. Whether to return the annotated VCFs with additional columns showing mutation class for each variant. Default is FALSE.
Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE.
See https://github.com/steverozen/ICAMS/blob/master/data-raw/PCAWG7_indel_classification_2021_09_03.xlsx for additional information on ID (small insertion and deletion) mutation classification.
See the documentation for Canonicalize1Del
which first handles
deletions in homopolymers, then handles deletions in simple repeats with
longer repeat units, (e.g. CACACACA
, see
FindMaxRepeatDel
), and if the deletion is not in a simple
repeat, looks for microhomology (see FindDelMH
).
See the code for unexported function CanonicalizeID
and the functions it calls for handling of insertions.
To add or change attributes of the catalog, you can use function
attr
.
For example, attr(catalog, "abundance")
<- custom.abundance
.
This function calls MutectVCFFilesToCatalog
,
PlotCatalogToPdf
, WriteCatalog
and
zip::zipr
.
dir <- c(system.file("extdata/Mutect-vcf",
package = "ICAMS"))
if (requireNamespace("BSgenome.Hsapiens.1000genomes.hs37d5", quietly = TRUE)) {
catalogs <-
MutectVCFFilesToZipFile(dir,
zipfile = file.path(tempdir(), "test.zip"),
ref.genome = "hg19",
trans.ranges = trans.ranges.GRCh37,
region = "genome",
base.filename = "Mutect")
unlink(file.path(tempdir(), "test.zip"))}
Run the code above in your browser using DataLab