# human

#####
A set of R objects containing observed data from three human
populations, and simulated data under three different demographic
models. The data set is used to illustrate model selection and parameter
inference in an ABC framework (see the vignette of the `abc`

package for more details).

`data(human)`

loads in four R objects: `stat.voight`

is a
data frame with 3 rows and 3 columns and contains the observed summary
statistics for three human populations, `stat.3pops.sim`

is also a
data frame with 150,000 rows and 3 columns and contains the simulated
summary statistics, `models`

is a vector of character strings of
length 150,000 and contains the model indices, `par.italy.sim`

is a
data frame with 50,000 rows and 4 columns and contains the parameter
values that were used to simulate data under a population bottleneck
model. The corresponding summary statistics can be subsetted from the
`stat.3pops.sim`

object as ```
subset(stat.3pops.sim,
subset=models=="bott")
```

.

- Keywords
- datasets

##### Usage

`data(human)`

##### Details

Data is provided to estimate the posterior probabilities of classical
demographic scenarios in three human populations: Hausa, Italian, and
Chinese. These three populations represent the three continents:
Africa, Europe, Asia, respectively. `par.italy.sim`

may then used
to estimate the ancestral population size of the European population
assuming a bottleneck model.
It is generally believed that African human populations are expanding,
while human populations from outside of Africa have gone through a
population bottleneck. Tajima's D statistic has been classically used
to detect changes in historical population size. A negative Tajima's D
signifies an excess of low frequency polymorphisms, indicating
population size expansion. While a positive Tajima's D indicates low
levels of both low and high frequency polymorphisms, thus a sign of a
population bottleneck. In constant size populations, Tajima's D is
expected to be zero.

With the help of the `human`

data one can reach these expected
conclusions for the three human population samples, in accordance with
the conclusions of Voight et al. (2005) (where the observed statistics
was taken from), but using ABC.

##### Format

The `stat.voight`

data frame contains the following columns:

`pi`

- The mean nucleotide diversity over 50 loci in 3 human populations, Hausa, Italian, and Chinese.
`TajD.m`

- The mean of Tajima's D statistic over 50 loci in 3 human populations, Hausa, Italian, and Chinese.
`TajD.v`

- The variance of Tajima's D statistic over 50 loci in 3 human populations, Hausa, Italian, and Chinese.

`stat.3pops.sim`

data frame contains the following columns:
`pi`

- The mean of nucleotide diversity over 50 simulated loci under 3 demographic scenarios: constant size population, population bottleneck, and population expansion.
`TajD.m`

- The mean of Tajima's D statistic over 50 simulated loci under 3 demographic scenarios: constant size population, population bottleneck, and population expansion.
`TajD.v`

- The variance of Tajima's D statistic over 50 simulated loci under 3 demographic scenarios: constant size population, population bottleneck, and population expansion.

`par.italy.sim`

data frame contains the following columns:
`Ne`

- The effective population size.
`a`

- The intensity of the bottleneck (i.e. the ratio of the population sizes before and during the bottleneck).
`duration`

- The duration of the bottleneck.
`start`

- The start of the bottleneck.

`models`

contains the names of the demographic models.
##### Source

The observed statistics were taken from Voight et al. 2005 (Table 1.). Also, the same input parameters were used as in Voight et al. 2005 to simulate data under the three demographic models. Simulations were performed using the software ms and the summary statistics were calculated using sample_stats (Hudson 1983).

##### References

B. F. Voight, A. M. Adams, L. A. Frisse, Y. Qian, R. R. Hudson and
A. Di Rienzo (2005) Interrogating multiple aspects of variation in a
full resequencing data set to infer human population size
changes. PNAS **102**, 18508-18513.

Hudson, R. R. (2002) Generating samples under a Wright-Fisher neutral
model of genetic variation. Bioinformatics **18** 337-338.

*Documentation reproduced from package abc.data, version 1.0, License: GPL (>= 3)*