Learn R Programming

onemap (version 2.1.1)

vcf2raw: Convert variants from a VCF file to OneMap file format

Description

Converts data from a standard VCF (Variant Call Format) file to the input format required by OneMap, while trying to identify the appropriate marker segregation patterns.

Usage

vcf2raw(input = NULL, output = NULL, cross = c("outcross",
  "f2 intercross", "f2 backcross", "ri self", "ri sib"), parent1 = NULL,
  parent2 = NULL, min_class = 1)

Arguments

input

path to the input VCF file.

output

path to the output OneMap file.

cross

type of cross. Must be one of: "outcross" for full-sibs; "f2 intercross" for an F2 intercross progeny; "f2 backcross"; "ri self" for recombinant inbred lines by self-mating; or "ri sib" for recombinant inbred lines by sib-mating.

parent1

string or vector of strings specifying sample ID(s) of the first parent.

parent2

string or vector of strings specifying sample ID(s) of the second parent.

min_class

a real number between 0.0 and 1.0. For each parent and each variant site, defines the proportion of parent samples that must be of the same genotype for it to be assigned to the corresponding parent.

Details

The input VCF file must be sorted, compressed and tabix indexed. Please check functions bgzip and indexTabix of package Rsamtools for details.

Each variant in the VCF file is processed independently. Only biallelic SNPs and indels for diploid variant sites are considered.

Genotype information on the parents is required for all cross types. For full-sib progenies, both outbred parents must be genotyped. For backcrosses, F2 intercrosses and recombinant inbred lines, the original inbred lines must be genotyped. Particularly for backcross progenies, the recurrent line must be provided as the first parent in the function arguments.

First, samples corresponding to both parents of the progeny are parsed and their genotypes identified, given that their replicates are concordant above a threshold given by min_class. This allows replicates of the parents to be used, which is common in sequencing plates. In detail, each parent will be called an heterozygote only if \(min\_class * number \ of \ replicates\) samples or more are heterozygous. The same is valid for homozygous calls. Whenever there are different genotypes among replicates, heterozygosity is checked first. The default value (1.0) requires that all replicates be of the same genotype. If each parent is represented by a single sample, this parameter has no effect.

Next, marker type is determined based on parental genotypes. Finally, progeny genotypes are identified and output is produced. Variants for which parent genotypes cannot be determined are discarded.

Reference sequence ID and position for each variant site are stored as special fields denoted CHROM and POS.

See Also

read_onemap for a description of the OneMap file format.

Examples

Run this code
# NOT RUN {
  
# }
# NOT RUN {
    vcf2raw(input="your_VCF_file.vcf.gz",
            output="your_OneMap_file.raw",
            cross="your_cross_type",
            parent1=c("PAR1_sample1", "PAR1_sample2"),
            parent2=c("PAR2_sample1", "PAR2_sample2", "PAR2_sample3"),
            min_class=0.5) # for parent1, a single heterozygote replicate results
                           # in a heterozygote genotype call; for parent2, at
                           # least two samples have to be concordant
  
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab