The input VCF file must be sorted, compressed and tabix indexed. Please check
functions bgzip
and indexTabix
of package Rsamtools
for
details.
Each variant in the VCF file is processed independently. Only biallelic SNPs
and indels for diploid variant sites are considered.
Genotype information on the parents is required for all cross types. For
full-sib progenies, both outbred parents must be genotyped. For backcrosses,
F2 intercrosses and recombinant inbred lines, the original inbred
lines must be genotyped. Particularly for backcross progenies, the
recurrent line must be provided as the first parent in the function
arguments.
First, samples corresponding to both parents of the progeny are parsed and
their genotypes identified, given that their replicates are concordant above
a threshold given by min_class
. This allows replicates of the parents
to be used, which is common in sequencing plates. In detail, each parent will
be called an heterozygote only if
\(min\_class * number \ of \ replicates\)
samples or more are heterozygous. The same is valid for homozygous calls.
Whenever there are different genotypes among replicates, heterozygosity is
checked first. The default value (1.0
) requires that all replicates be
of the same genotype. If each parent is represented by a single sample, this
parameter has no effect.
Next, marker type is determined based on parental genotypes. Finally, progeny
genotypes are identified and output is produced. Variants for which parent
genotypes cannot be determined are discarded.
Reference sequence ID and position for each variant site are stored as special
fields denoted CHROM
and POS
.