Learn R Programming

VariantAnnotation (version 1.18.1)

ScanVcfParam-class: Parameters for scanning VCF files

Description

Use ScanVcfParam() to create a parameter object influencing which records and fields are imported from a VCF file. Record parsing is based on genomic coordinates and requires a Tabix index file. Individual VCF elements can be specified in the fixed, info, geno and samples arguments.

Usage

ScanVcfParam(fixed=character(), info=character(), geno=character(), 
               samples=character(), trimEmpty=TRUE, which, ...)

## Getters and Setters vcfFixed(object) vcfFixed(object) <- value vcfInfo(object) vcfInfo(object) <- value vcfGeno(object) vcfGeno(object) <- value vcfSamples(object) vcfSamples(object) <- value vcfTrimEmpty(object) vcfTrimEmpty(object) <- value vcfWhich(object) vcfWhich(object) <- value

Arguments

fixed
A character() vector of fixed fields to be returned. Possible values are ALT, QUAL and FILTER. The CHROM, POS, ID and REF fields are needed to create the GRanges of variant locations. Because these are essential fields there is no option to request or omit them. If not specified, all fields are returned; if fixed=NA only REF is returned.
info
A character() vector naming the INFO fields to return. scanVcfHeader() returns a vector of available fields. If not specified, all fields are returned; if info=NA no fields are returned.
geno
A character() vector naming the GENO fields to return. scanVcfHeader() returns a vector of available fields. If not specified, all fields are returned; if geno=NA no fields are returned and requests for specific samples are ignored.
samples
A character() vector of sample names to return. samples(scanVcfHeader()) returns all possible names. If not specified, data for all samples are returned; if either samples=NA or geno=NA no fields are returned. Requests for specific samples when geno=NA are ignored.
trimEmpty
A logical(1) indicating whether GENO fields with no values should be returned.
which
A GRanges describing the sequences and ranges to be queried. Variants whose POS lies in the interval(s) [start, end] are returned. If which is not specified all ranges are returned.
object
An instance of class ScanVcfParam.
value
An instance of the corresponding slot, to be assigned to object.
...
Arguments passed to methods.

Objects from the Class

Objects can be created by calls of the form ScanVcfParam().

See Also

readVcf

Examples

Run this code
ScanVcfParam()

  fl <- system.file("extdata", "structural.vcf", package="VariantAnnotation")
  compressVcf <- bgzip(fl, tempfile())
  idx <- indexTabix(compressVcf, "vcf")
  tab <- TabixFile(compressVcf, idx)
  ## ---------------------------------------------------------------------
  ## 'which' argument
  ## ---------------------------------------------------------------------
  ## To subset on genomic coordinates, supply an object 
  ## containing the ranges of interest. These ranges can
  ## be given directly to the 'param' argument or wrapped
  ## inside ScanVcfParam() as the 'which' argument. 

  ## When using a list, the outer list names must correspond to valid
  ## chromosome names in the vcf file. In this example they are "1" 
  ## and "2".
  gr1 <- GRanges("1", IRanges(13219, 2827695, name="regionA"))
  gr2 <- GRanges(rep("2", 2), IRanges(c(321680, 14477080), 
                 c(321689, 14477090), name=c("regionB", "regionC")))
  grl <- GRangesList("1"=gr1, "2"=gr2)
  vcf <- readVcf(tab, "hg19", grl)

  ## Names of the ranges are in the 'paramRangeID' metadata column of the
  ## GRanges object returned by the rowRanges() accessor.
  rowRanges(vcf)

  ## which can be used for subsetting the VCF object
  vcf[rowRanges(vcf)$paramRangeID == "regionA"]
 
  ## When using ranges, the seqnames must correspond to valid
  ## chromosome names in the vcf file.
  gr <- unlist(grl, use.names=FALSE)
  vcf <- readVcf(tab, "hg19", gr)
 
  ## ---------------------------------------------------------------------
  ## 'fixed', 'info', 'geno' and 'samples' arguments
  ## ---------------------------------------------------------------------
  ## This param specifies the "GT" 'geno' field for a single sample
  ## and the subset of ranges in 'which'. All 'fixed' and 'info' fields 
  ## will be returned. 
  ScanVcfParam(geno="GT", samples="NA00002", which=gr)
 
  ## Here two 'fixed' and one 'geno' field are specified
  ScanVcfParam(fixed=c("ALT", "QUAL"), geno="GT", info=NA) 

  ## Return only the 'fixed' fields
  ScanVcfParam(geno=NA, info=NA)

Run the code above in your browser using DataLab