Numeric vector with the size of each cluster in the current genotype category
bestConf
Character string with genotype category, or numeric index to correct
genotype category in polyCent
polyCent
List with all genotype categories and the allele ratios
corresponding to the different clusters. See generatePolyCenters
afList
Numeric vector of values within [0, 0.5], denoting which B allele
frequencies to test in the case of two segregating paralogs (see below)
Value
Data-frame with the chi-square test statistic, the degrees of freedom,
p-value, and both estimated B allele frequencies
Details
The null hypothesis of the test is that the population is in HW
frequencies. As most populations deviate more or less from HW
equilibrium, strict control of this test is usually not
recommended. Still, it can a very powerful test to detect failed
clustering, by using a sufficiently low significant level to
account for naturally deviating samples. If the sample contain
subjects from different populations however, the power of the test may
be very low.
At duplicated loci, the observed B allele frequency (BAF) is in
fact the mean BAF across both paralogues. For MSV-5 markers, which are
segregating in both paralogs, the individual BAFs may be estimated
assuming different candidate values at one paralogue. Several values
of BAF for one paralogue are set in afList, such that
the BAF for the other paralogue is given. A value of 0.5 means
the BAF is the same at both loci. All values are tested for HW, and
the most likely BAF at both paralogs are those resulting in the
highest p-value. The accuracy of these estimates increases with the
degree of HW equilibrium.