Learn R Programming

VariantAnnotation (version 1.18.1)

snpSummary: Counts and distribution statistics for SNPs in a VCF object

Description

Counts and distribution statistics for SNPs in a VCF object

Usage

## S3 method for class 'CollapsedVCF':
snpSummary(x, ...)

Arguments

x
A CollapsedVCF object.
...
Additional arguments to methods.

Value

  • The object returned is a data.frame with seven columns. [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Details

Genotype counts, allele counts and Hardy Weinberg equilibrium (HWE) statistics are calculated for single nucleotide variants in a CollapsedVCF object. HWE has been established as a useful quality filter on genotype data. This equilibrium should be attained in a single generation of random mating. Departures from HWE are indicated by small p values and are almost invariably indicative of a problem with genotype calls.

The following caveats apply:

  • No distinction is made between phased and unphased genotypes.
  • Only diploid calls are included.
  • Only `valid' SNPs are included. A `valid' SNP is defined as having a reference allele of length 1 and a single alternate allele of length 1.
Variants that do not meet these criteria are set to NA.

See Also

genotypeToSnpMatrix, probabilityToSnpMatrix

Examples

Run this code
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
  vcf <- readVcf(fl, "hg19")

  ## The return value is a data.frame with genotype counts
  ## and allele frequencies.
  df <- snpSummary(vcf)
  df

  ## Compare to ranges in the VCF object:
  rowRanges(vcf)

  ## No statistics were computed for the variants in rows 3, 4 
  ## and 5. They were omitted because row 3 has two alternate 
  ## alleles, row 4 has none and row 5 is not a SNP.

Run the code above in your browser using DataLab