Learn R Programming

SeqArray (version 1.8.0)

seqVCF2GDS: Reformat VCF files

Description

Reformat Variant Call Format (VCF) files.

Usage

seqVCF2GDS(vcf.fn, out.fn, header = NULL, genotype.var.name = "GT", compress.option = seqCompress.Option(), info.import=NULL, fmt.import=NULL, ignore.chr.prefix="chr", raise.error=TRUE, verbose=TRUE)

Arguments

vcf.fn
the file name(s) of VCF format
out.fn
the file name of output GDS file
header
if NULL, header is set to be seqVCF.Header(vcf.fn)
genotype.var.name
the ID for genotypic data in the FORMAT column; "GT" by default, VCFv4.0
compress.option
specify the compression options, by default seqCompress.Option
info.import
characters, the variable name(s) in the INFO field for import; or NULL for all variables
fmt.import
characters, the variable name(s) in the FORMAT field for import; or NULL for all variables
ignore.chr.prefix
a vector of character, indicating the prefix of chromosome which should be ignored, like "chr"; it is not case-sensitive
raise.error
TRUE: throw an error if numeric conversion fails; FALSE: get missing value if numeric conversion fails
verbose
if TRUE, show information

Value

Return the file name of GDS format with an absolute path.

Details

GDS -- Genomic Data Structures used for storing genetic array-oriented data, and the file format used in the gdsfmt package.

VCF -- The Variant Call Format (VCF), which is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations.

If there are more than one files in vcf.fn, seqVCF2GDS will merge all dataset together if they contain the same samples. It is useful to merge genomic variants if VCF data are divided by chromosomes.

References

The variant call format and VCFtools. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R; 1000 Genomes Project Analysis Group. Bioinformatics. 2011 Aug 1;27(15):2156-8. Epub 2011 Jun 7.

http://corearray.sourceforge.net/

See Also

seqVCF.Header, seqCompress.Option, seqGDS2VCF

Examples

Run this code
# the file name of VCF
vcf.fn <- seqExampleFileName("vcf")
# or vcf.fn <- "C:/YourFolder/Your_VCF_File.vcf"

# convert
seqVCF2GDS(vcf.fn, "tmp.gds")

# display
(f <- seqOpen("tmp.gds"))
seqClose(f)



# convert without the INFO fields
seqVCF2GDS(vcf.fn, "tmp.gds", info.import=character(0))

# display
(f <- seqOpen("tmp.gds"))
seqClose(f)



# convert without the INFO fields
seqVCF2GDS(vcf.fn, "tmp.gds",
    info.import=character(0), fmt.import=character(0))

# display
(f <- seqOpen("tmp.gds"))
seqClose(f)

Run the code above in your browser using DataLab