variantGR2Vcf: Create a VCF for some variants

Description

The deprecated way to create a VCF object from a variant/tally GRanges. This can then be output to a file using writeVcf. The flavor of VCF is specific for calling variants, not genotypes; see below.

Usage

variantGR2Vcf(x, sample.id, project = NULL, genome = unique(GenomicRanges::genome(x)))

Arguments

The variant/tally GRanges.

sample.id

Unique ID for the sample in the VCF.

project

Description of the project/experiment; will be included in the VCF header.

genome

GmapGenome object, or the name of one (in the default genome directory). This is used for obtaining the anchor base when outputting indels.

Value

A VCF object.

Details

A variant GRanges has an element for every unique combination of position and alternate base. A VCF object, like the file format, has a row for every position, with multiple alternate alleles collapsed within the row. This is the fundamental difference between the two data structures. We feel that the GRanges is easier to manipulate for filtering tasks, while VCF is obviously necessary for communication with external databases and tools.

Normally, despite its name, VCF is used for communicating genotype calls. We are calling variants, not genotypes, so we have extended the format accordingly.

Here is the mapping in detail:

The rowRanges is formed by dropping the metadata columns from the GRanges.
The colData consists of a single column, “Samples”, with a single row, set to 1 and named sample.id.
The exptData has an element “header” with element “reference” set to the seqlevels(x) and element “samples” set to sample.id. This will also include the necessary metadata for describing our extensions to the format.
The fixed table has the “REF” and “ALT” alleles, with “QUAL” and “FILTER” set to NA.
The geno list has six matrix elements, all with a single column. The first is the mandatory “GT” element, the genotype, which we set to NA. Then there is “AD” (list matrix with the read count for each REF and ALT), “DP” (integer matrix with the total read count), and “AP” (list matrix of 0/1 flags for whether whether REF and/or ALT was present in the data).

Examples

Run this code

## Not run: 
# vcf <- variantGR2Vcf(variants, "H1993", "example")
# writeVcf(vcf, "H1993.vcf", index = TRUE)
# ## End(Not run)

Run the code above in your browser using DataLab