Learn R Programming

vcfR (version 1.1.0)

VCF input and output: Read and write vcf format files

Description

Read and files in the *.vcf structured text format, as well as the compressed *.vcf.gz format. Write objects of class vcfR to *.vcf.gz.

Usage

read.vcfR(file, limit = 1e+07, nrows = -1, skip = 0, cols = NULL,
  verbose = TRUE)

write.vcf(x, file = "", mask = FALSE, APPEND = FALSE)

Arguments

file

A filename for a variant call format (vcf) file.

limit

amount of memory (in bytes) not to exceed when reading in a file.

nrows

integer specifying the maximum number of rows (variants) to read in.

skip

integer specifying the number of rows (variants) to skip before beginning to read data.

cols

vector of column numbers to extract from file.

verbose

report verbose progress.

x

An object of class vcfR or chromR.

mask

logical vector indicating rows to use.

APPEND

logical indicating whether to append to existing vcf file or write a new file.

Value

read.vcfR returns an object of class vcfR-class. See the vignette: vignette('vcf_data'). The function write.vcf creates a gzipped VCF file.

Details

The function read.vcfR reads in files in *.vcf (text) and *.vcf.gz (gzipped text) format and returns an object of class vcfR. The parameter 'limit' is an attempt to keep the user from trying to read in a file which contains more data than there is memory to hold. Based on the dimensions of the data matrix, an estimate of how much memory needed is made. If this estimate exceeds the value of 'limit' an error is thrown and execution stops. The user may increase this limit to any value, but is encourages to compare that value to the amout of available physical memory.

It is possible to input part of a VCF file by using the parameters nrows, skip and cols. The first eight columns (the fix region) are part of the definition and will always be included. Any columns beyond eight are optional (the gt region). You can specify which of these columns you would like to input by setting the cols parameter. If you want a usable vcfR object you will want to always include nine (the FORMAT column). If you do not include column nine you may experience reduced functionality.

The function write.vcf takes an object of either class vcfR or chromR and writes the vcf data to a vcf.gz file (gzipped text). If the parameter 'mask' is set to FALSE, the entire object is written to file. If the parameter 'mask' is set to TRUE and the object is of class chromR (which has a mask slot), this mask is used to subset the data. If an index is supplied as 'mask', then this index is used, and recycled as necessary, to subset the data.

See Also

CRAN: pegas::read.vcf, PopGenome::readVCF, data.table::fread

Bioconductor: VariantAnnotation::readVcf

Use: browseVignettes('vcfR') to find examples.

Examples

Run this code
# NOT RUN {
data(vcfR_test)
vcfR_test
head(vcfR_test)
# CRAN requires developers to us a tempdir when writing to the filesystem.
# You may want to implement this example elsewhere.
orig_dir <- getwd()
temp_dir <- tempdir()
setwd( temp_dir )
write.vcf( vcfR_test, file = "vcfR_test.vcf.gz" )
vcf <- read.vcfR( file = "vcfR_test.vcf.gz", verbose = FALSE )
vcf
setwd( orig_dir )


# }

Run the code above in your browser using DataLab