D.Jost(filename, bias = "correct", object = FALSE, format.table = TRUE,
pm = "pairwise", statistics = "CI", bt = 1000)
Gst.Nei(filename, bias = "correct", object = FALSE, format.table = TRUE,
pm = "pairwise", statistics = "CI", bt = 1000)
object
. If
object=FALSE
(default), the filename has to be a combination of (1) the
name of the data file (.txt format
) in which the raw data are
correct
(default) and
uncorrected
). When using the correct
option, Hs and Ht are
transformed into nearly unbiased estimators Hs.est and Ht.est derived by Nei & Chesser
(TRUE
or FALSE
, depending on the format of the
argument filename
.TRUE
(default) or FALSE
that defines if the format of the table has to be transformed before analysis (see details).pm="pairwise"
, default) or otherwise to
average the D or Gst values over all populations (pm="overall"
).statistics="none"
), 95% confidence intervals
(statistics="CI"
), p-values (statistics=
"p"
, testing against the null hypothesis of no
genetic 1000
) that defines the
amount of bootstrap resamplings, that the p-values and/or the 95% confidence intervals are based on.getwd()
and changed by using the function setwd()
. During
the calculation, the output is printed in the R console where the kind
of data is also shortly described and how the respective .txt files are
named. The filenames include the argument filename
and the actual date.In case that you are comparing more than two populations pairwise and are calculating p-values and/or confidence intervals, you will be informed about the estimated end of the analysis after completion of the first pairwise comparison.
If the same analysis is carried out more than once at the same day on a single dataset, the results will all be found, one written below the other, separated by a row of column names, in the same file (if the working directory was not changed).
The output files are described in the following paragraphs:
pm="pairwise"
or differentiation / fixation is estimated over all
populations pm="overall"
, the result tables comprising the
D/Gst values differ slightly.
When overall D or Gst values are
evaluated, the output comprises the following two data tables (X
stands for D, Dest, Gst or Gst.est values):If an analysis is carried out more than once at the same day, the results will all be found, one written below the other, separated by a row of column names in the same file (if the working directory was not changed).
If an analysis runs more than one day, the INTERMEDIATE RESULTs will be saved in different files, according to the date, they had been analysed on. But all the INTERMEDIATE RESULTs will be included in the END RESULT in which all INTERMEDIATE RESULTs are finally saved together.
The output comprises data tables with the following information (X stands for D, Dest, Gst or Gst.est values):
format.table=TRUE
, a data file called
When you carry out pairwise population comparisons, you will be informed after evaluation of the data for the first population pair, when the whole analysis is estimated to finish.
The data table that has to be transformed by choosing
format.table=TRUE
, can be
provided in the following format:
individual
and population
must be included. The
other columns listing the fragment lengths in base pairs can be named
arbitrarily. It is recommended name the two columns that refer to the
same locus, equally (e.g. locus1.allele.a
and locus1.allele.b
should
both be named Locus1
). Mathematical signs, like +
or -
should be
avoided and spaces are not allowed in column names.
Alternatively, when the input data are given in the following format,
they do not have to be transformed (format.table=FALSE
):
fragment.length
represent numbers of base pairs.
Details on confidence interval calculation
95% confidence intervals of the D or Gst values are based on the range
of these values from reallocated data sets that are obtained by
bootstrapping alleles (or genotypes) of one locus within populations.
Hardy Weinberg Equilibrium (HWE) is tested for each locus and each
population. If all of the tested populations are in HWE, the alleles of
a single locus, are randomized within populations. Otherwise, alleles are not
inherited independently from each other and genotypes are randomized
within populations (Goudet, 1996). The upper and lower 95% confidence limits are evaluated as the lower
(0.025) and upper (0.975) bounds of the quantiles of D or Gst values
from the resampled data using the function quantile
:
Empirical D or Gst +(-) upper(lower) quantile bound
Details on p-value calculation To be able to test the null hypothesis of absence of genetic differentiation between populations, a bootstrap method is performed. Thereby, alleles (or genotypes) of one locus are randomized over all compared populations. Hardy Weinberg Equilibrium HWE is tested for each locus and each population. If all of the tested populations are in HWE, the alleles of a single locus, are randomized over all populations. Otherwise, alleles are not inherited independently from each other and genotypes are randomized over all populations (Goudet, 1996). Reallocating alleles or genotypes simulates populations that share a common gene pool and are not differentiated. Since the empirical value of genetic differentiation is expected to be larger than a value obtained from within a panmictic population when the tested populations are significantly differentiated, a one tailed test is carried out. The null hypothesis (panmictic populations) can be rejected at a 95% significance level (p<0.05) when="" the="" empirical="" value="" is="" larger="" than="" 95%="" of="" bootstrapped="" test="" statistics.="" p-value="" calculated="" according="" to="" manly="" (1997,="" p.="" 62).<="" p="">
When more than two populations are compared with one another, using the
option pm="pairwise"
, the p-values are adjusted in order to
account for the multiple comparison from one data set, using the
function p.adjust
of the package stats
. They represent the
smallest overall significance levels, at which the hypothesis would be
rejected (Wright, 1992). Those p-values giving the significance levels
for different loci, are adjusted independently from each other. Those
p-values giving the significance levels for the averaged differentiation
over all loci, are adjusted to one another. The adjustment is performed
by Bonferroni correction, by Holm's method, by Hommel's method and by a
method provided by Benjamini and Hochberg. See the help file of the
function p.adjust
for further information on these methods.
Test for Hardy Weinberg Equilibrium HWE
Before bootstrapping, populations are automatically tested for being in HWE by comparing the
empirical numbers of genotypes and those expected under HWE using the
function chisq.test
with the arguments: simulate.p.value=TRUE
,
b=10000
. This means, that the p-value is obtained from a Monte Carlo
method with 10000-fold resampling. The null hypothesis of HWE is
rejected when p is smaller than 0.05.
# loading data from the example files of this package
data(Example.transformed)
Example.t <- Example.transformed
data(Example.untransformed)
Example.u <- Example.untransformed
# Calculating mean Dest values (averaged over all populations) with
# p-values and confidence intervals using only 10 bootstrap resamplings
D.Jost("Example.t", bias="correct", object=TRUE, format.table=FALSE,
pm="overall", statistics="all", bt=10)
# Calculating pairwise Gst values without any statistics
Gst.Nei("Example.u", bias="uncorrected", object=TRUE, format.table=TRUE,
pm="pairwise", statistics="none")
# If you do not know where the results of these example tables have been
# saved, type getwd()
Run the code above in your browser using DataLab