Learn R Programming

VariantAnnotation (version 1.18.1)

scanVcf: Import VCF files

Description

Import Variant Call Format (VCF) files in text or binary format

Usage

scanVcfHeader(file, ...)
## S3 method for class 'character':
scanVcfHeader(file, ...)

scanVcf(file, ..., param) ## S3 method for class 'character,ScanVcfParam': scanVcf(file, ..., param) ## S3 method for class 'character,missing': scanVcf(file, ..., param) ## S3 method for class 'connection,missing': scanVcf(file, ..., param)

## S3 method for class 'TabixFile': scanVcfHeader(file, ...) ## S3 method for class 'TabixFile,missing': scanVcf(file, ..., param) ## S3 method for class 'TabixFile,ScanVcfParam': scanVcf(file, ..., param) ## S3 method for class 'TabixFile,GRanges': scanVcf(file, ..., param) ## S3 method for class 'TabixFile,RangesList': scanVcf(file, ..., param)

Arguments

file
For scanVcf and scanVcfHeader, the character() file name, TabixFile, or class connection (file() or bgzip()) of the VCF file to be processed.
param
A instance of ScanVcfParam influencing which records are parsed and the INFO and GENO information returned.
...
Additional arguments for methods

Value

  • scanVcfHeader returns a VCFHeader object with header information parsed into five categories, samples, meta, fixed, info and geno. Each can be accessed with a `getter' of the same name (e.g., info()).

    scanVcf returns a list, with one element per range. Each list has 7 elements, obtained from the columns of the VCF specification: [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object] The GENO element is itself a list, with elements corresponding to those defined in the VCF file header. For scanVcf, elements of GENO are returned as a matrix of records x samples; if the description of the element in the file header indicated multiplicity other than 1 (e.g., variable number for A, G, or .), then each entry in the matrix is a character string with sub-entries comma-delimited.

Details

The argument param allows portions of the file to be input, but requires that the file be bgzip'd and indexed as a TabixFile.

scanVcf with param="missing" and file="character" or file="connection" scan the entire file. With file="connection", an argument n indicates the number of lines of the VCF file to input; a connection open at the beginning of the call is open and incremented by n lines at the end of the call, providing a convenient way to stream through large VCF files.

The INFO field of the scanned VCF file is returned as a single packed vector, as in the VCF file. The GENO field is a list of matricies, each matrix corresponds to a field as defined in the FORMAT field of the VCF header. Each matrix has as many rows as scanned in the VCF file, and as many columns as there are samples. As with the INFO field, the elements of the matrix are packed. The reason that INFO and GENO are returned packed is to facilitate manipulation, e.g., selecting particular rows or samples in a consistent manner across elements.

References

http://vcftools.sourceforge.net/specs.html outlines the VCF specification.

http://samtools.sourceforge.net/mpileup.shtml contains information on the portion of the specification implemented by bcftools.

http://samtools.sourceforge.net/ provides information on samtools.

See Also

readVcf BcfFile TabixFile

Examples

Run this code
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
  scanVcfHeader(fl)
  vcf <- scanVcf(fl)
  ## value: list-of-lists
  str(vcf)
  names(vcf[[1]][["GENO"]])
  vcf[[1]][["GENO"]][["GT"]]

Run the code above in your browser using DataLab