VCF
object from a variant/tally
GRanges
. This can then be output to a file using
writeVcf
. The flavor of VCF is
specific for calling variants, not genotypes; see below.
variantGR2Vcf(x, sample.id, project = NULL, genome = unique(GenomicRanges::genome(x)))
GRanges
.
GmapGenome
object, or the name of one (in the default genome
directory). This is used for obtaining the anchor base when
outputting indels.
VCF
object.
GRanges
has an element for every unique combination
of position and alternate base. A VCF
object, like the file
format, has a row for every position, with multiple alternate alleles
collapsed within the row. This is the fundamental difference between
the two data structures. We feel that the GRanges
is easier to
manipulate for filtering tasks, while VCF
is obviously
necessary for communication with external databases and tools.Normally, despite its name, VCF is used for communicating genotype calls. We are calling variants, not genotypes, so we have extended the format accordingly.
Here is the mapping in detail:
rowRanges
is formed by dropping the metadata columns
from the GRanges
.
colData
consists of a single column,
Samples, with a single row, set to 1 and named
sample.id
.
exptData
has an element header with element
reference set to the seqlevels(x)
and element
samples set to sample.id
. This will also include the
necessary metadata for describing our extensions to the format.
fixed
table has the REF and ALT
alleles, with QUAL and FILTER set to NA
.
geno
list has six matrix elements, all with a
single column. The first is the mandatory GT element, the
genotype, which we set to NA
. Then there is AD
(list matrix with the read count for each REF and ALT),
DP (integer matrix with the total read count), and
AP (list matrix of 0/1 flags for whether whether REF
and/or ALT was present in the data).
## Not run:
# vcf <- variantGR2Vcf(variants, "H1993", "example")
# writeVcf(vcf, "H1993.vcf", index = TRUE)
# ## End(Not run)
Run the code above in your browser using DataLab