The missing information is calculated using the multipoint genotype
probabilities calculated with calc.genoprob
. The entropy version of the missing information: for a single
individual at a single genomic position, we measure the missing
information as $H = \sum_g p_g \log p_g / \log n$, where $p_g$ is the probability of the
genotype $g$, and $n$ is the number of possible genotypes,
defining $0 \log 0 = 0$. This takes values between 0
and 1, assuming the value 1 when the genotypes (given the marker data)
are equally likely and 0 when the genotypes are completely determined.
We calculate the missing information at a particular position as the
average of $H$ across individuals. For an intercross, we don't
scale by $\log n$ but by the entropy in the case of genotype
probabilities (1/4, 1/2, 1/4).
The variance version of the missing information: we calculate the
average, across individuals, of the variance of the genotype
distribution (conditional on the observed marker data) at a particular
locus, and scale by the maximum such variance.
Calculations are done in C (for the sake of speed in the presence of
little thought about programming efficiency) and the plot is created
by a call to plot.scanone
.
Note that summary.scanone
may be used to display
the maximum missing information on each chromosome.