Learn R Programming

VariantAnnotation (version 1.18.5)

snpSummary: Counts and distribution statistics for SNPs in a VCF object

Description

Counts and distribution statistics for SNPs in a VCF object

Usage

"snpSummary"(x, ...)

Arguments

x
A CollapsedVCF object.
...
Additional arguments to methods.

Value

The object returned is a data.frame with seven columns.
g00
Counts for genotype 00 (homozygous reference).
g01
Counts for genotype 01 or 10 (heterozygous).
g11
Counts for genotype 11 (homozygous alternate).
a0Freq
Frequency of the reference allele.
a1Freq
Frequency of the alternate allele.
HWEzscore
Z-score for departure from a null hypothesis of Hardy Weinberg equilibrium.
HWEpvalue
p-value for departure from a null hypothesis of Hardy Weinberg equilibrium.

Details

Genotype counts, allele counts and Hardy Weinberg equilibrium (HWE) statistics are calculated for single nucleotide variants in a CollapsedVCF object. HWE has been established as a useful quality filter on genotype data. This equilibrium should be attained in a single generation of random mating. Departures from HWE are indicated by small p values and are almost invariably indicative of a problem with genotype calls.

The following caveats apply:

  • No distinction is made between phased and unphased genotypes.
  • Only diploid calls are included.
  • Only `valid' SNPs are included. A `valid' SNP is defined as having a reference allele of length 1 and a single alternate allele of length 1.

Variants that do not meet these criteria are set to NA.

See Also

genotypeToSnpMatrix, probabilityToSnpMatrix

Examples

Run this code
  fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
  vcf <- readVcf(fl, "hg19")

  ## The return value is a data.frame with genotype counts
  ## and allele frequencies.
  df <- snpSummary(vcf)
  df

  ## Compare to ranges in the VCF object:
  rowRanges(vcf)

  ## No statistics were computed for the variants in rows 3, 4 
  ## and 5. They were omitted because row 3 has two alternate 
  ## alleles, row 4 has none and row 5 is not a SNP.

Run the code above in your browser using DataLab