Last chance! 50% off unlimited learning
Sale ends in
## S3 method for class 'CollapsedVCF':
genotypeToSnpMatrix(x, uncertain=FALSE, ...)
## S3 method for class 'array':
genotypeToSnpMatrix(x, ref, alt, ...)
CollapsedVCF
object or a array
of genotype data
from the "GT", "GP", "GL" or "PL" FORMAT field of a VCF file. This array
is created
with a call to readVcf
and can be accessed with geno()
.uncertain=FALSE
) or the "GP",
"GL" or "PL" field (uncertain=TRUE
).DNAStringSet
of reference alleles.DNAStringSetList
of alternate alleles."SnpMatrix"
. The columns are snps and the rows are the samples.
See ?SnpMatrix
details of the class structure.DataFrame
giving the snp names and alleles at each locus.
The ignore
column indicates which variants were set to NA
(see NA
criteria in 'details' section).uncertain=TRUE
genotypeToSnpMatrix
converts an array of genotype calls from the
"GT", "GP", "GL" or "PL" FORMAT field of a VCF file into a
SnpMatrix. The following caveats apply,
## ----------------------------------------------------------------
## Non-probability based snp encoding using "GT"
## ----------------------------------------------------------------
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
## This file has no "GL" or "GP" field so we use "GT".
geno(vcf)
## Convert the "GT" FORMAT field to a SnpMatrix.
mat <- genotypeToSnpMatrix(vcf)
## The result is a list of length 2.
names(mat)
## Compare coding in the VCF file to the SnpMatrix.
geno(vcf)$GT
t(as(mat$genotype, "character"))
## The 'ignore' column in 'map' indicates which variants
## were set to NA. Variant rs6040355 was ignored because
## it has multiple alternate alleles, microsat1 is not a
## snp, and chr20:1230237 has no alternate allele.
mat$map
## ----------------------------------------------------------------
## Probability-based encoding using "GL", "PL" or "GP"
## ----------------------------------------------------------------
## Read a vcf file with a "GL" field.
fl <- system.file("extdata", "gl_chr1.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
geno(vcf)
## Convert the "GL" FORMAT field to a SnpMatrix
mat <- genotypeToSnpMatrix(vcf, uncertain=TRUE)
## Only 3 of the 9 variants passed the filters. The
## other 6 variants had no alternate alleles.
mat$map
## Compare genotype representations for a subset of
## samples in variant rs180734498.
## Original called genotype
geno(vcf)$GT["rs180734498", 14:16]
## Original genotype likelihoods
geno(vcf)$GL["rs180734498", 14:16]
## Posterior probability (computed inside genotypeToSnpMatrix)
GLtoGP(geno(vcf)$GL["rs180734498", 14:16, drop=FALSE])[1,]
## SnpMatrix coding.
t(as(mat$genotype, "character"))["rs180734498", 14:16]
t(as(mat$genotype, "numeric"))["rs180734498", 14:16]
## For samples NA11829 and NA11830, one probability is significantly
## higher than the others, so SnpMatrix calls the genotype. These
## calls match the original coding: "0|1" -> "A/B", "0|0" -> "A/A".
## Sample NA11831 was originally called as "0|1" but the probability
## of "0|0" is only a factor of 3 lower, so SnpMatrix calls it as
## "Uncertain" with an appropriate byte-level encoding.
Run the code above in your browser using DataLab