snpgdsVCF2GDS(vcf.fn, out.fn, method=c("biallelic.only", "copy.num.of.ref"), snpfirstdim=FALSE, compress.annotation="ZIP_RA.max", compress.geno="", ref.allele=NULL, ignore.chr.prefix="chr", verbose=TRUE)
vcf.fn
can be a vector,
see detailsadd.gdsn
add.gdsn
NULL
or a character vector indicating reference
allele (like "A", "G", "T", NA, ...
) for each site where
NA
to use the original reference allele in the VCF file(s).
The length of character vector should be the total number of variants
in the VCF file(s).TRUE
, show informationVCF -- The Variant Call Format (VCF), which is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations.
If there are more than one file names in vcf.fn
,
snpgdsVCF2GDS
will merge all dataset together if they all contain
the same samples. It is useful to combine genetic/genomic data together if
VCF data are divided by chromosomes.
method = "biallelic.only"
: to exact bi-allelic and polymorhpic
SNP data (excluding monomorphic variants);
method = "copy.num.of.ref"
: to extract and store dosage (0, 1, 2)
of the reference allele for all variant sites, including bi-allelic SNPs,
multi-allelic SNPs, indels and structural variants.
Haploid and triploid calls are allowed in the transfer, the variable
snp.id
stores the original the row index of variants, and the
variable snp.rs.id
stores the rs id.
When snp.chromosome
in the GDS file is character, SNPRelate treats
a chromosome as autosome only if it can be converted to a numeric value (
like "1", "22"). It uses "X" and "Y" for non-autosomes instead of numeric
codes. However, some software format chromosomes in VCF files with a prefix
"chr". Users should remove that prefix when importing VCF files by setting
ignore.chr.prefix = "chr"
.
snpgdsBED2GDS
# the VCF file
vcf.fn <- system.file("extdata", "sequence.vcf", package="SNPRelate")
cat(readLines(vcf.fn), sep="\n")
snpgdsVCF2GDS(vcf.fn, "test1.gds", method="biallelic.only")
snpgdsSummary("test1.gds")
snpgdsVCF2GDS(vcf.fn, "test2.gds", method="biallelic.only", snpfirstdim=TRUE)
snpgdsSummary("test2.gds")
snpgdsVCF2GDS(vcf.fn, "test3.gds", method="copy.num.of.ref", snpfirstdim=TRUE)
snpgdsSummary("test3.gds")
snpgdsVCF2GDS(vcf.fn, "test4.gds", method="copy.num.of.ref")
snpgdsSummary("test4.gds")
snpgdsVCF2GDS(vcf.fn, "test5.gds", method="copy.num.of.ref",
ref.allele=c("A", "T", "T", "T", "A"))
snpgdsSummary("test5.gds")
# open "test1.gds"
(genofile <- snpgdsOpen("test1.gds"))
read.gdsn(index.gdsn(genofile, "sample.id"))
read.gdsn(index.gdsn(genofile, "snp.rs.id"))
read.gdsn(index.gdsn(genofile, "genotype"))
# close the file
snpgdsClose(genofile)
# open "test2.gds"
(genofile <- snpgdsOpen("test2.gds"))
read.gdsn(index.gdsn(genofile, "sample.id"))
read.gdsn(index.gdsn(genofile, "snp.rs.id"))
read.gdsn(index.gdsn(genofile, "genotype"))
# close the file
snpgdsClose(genofile)
# open "test3.gds"
(genofile <- snpgdsOpen("test3.gds"))
read.gdsn(index.gdsn(genofile, "sample.id"))
read.gdsn(index.gdsn(genofile, "snp.rs.id"))
read.gdsn(index.gdsn(genofile, "genotype"))
# close the file
snpgdsClose(genofile)
# open "test4.gds"
(genofile <- snpgdsOpen("test4.gds"))
read.gdsn(index.gdsn(genofile, "sample.id"))
read.gdsn(index.gdsn(genofile, "snp.rs.id"))
read.gdsn(index.gdsn(genofile, "snp.allele"))
read.gdsn(index.gdsn(genofile, "genotype"))
# close the file
snpgdsClose(genofile)
# open "test5.gds"
(genofile <- snpgdsOpen("test5.gds"))
read.gdsn(index.gdsn(genofile, "sample.id"))
read.gdsn(index.gdsn(genofile, "snp.rs.id"))
read.gdsn(index.gdsn(genofile, "snp.allele"))
read.gdsn(index.gdsn(genofile, "genotype"))
# close the file
snpgdsClose(genofile)
# delete the temporary files
unlink(paste("test", 1:5, ".gds", sep=""), force=TRUE)
Run the code above in your browser using DataLab