Learn R Programming

dartR.sim (version 0.71)

gl.sim.ind.af: Simulate diploid genotypes from per-population allele frequencies

Description

This function generates a diploid SNP dataset by sampling genotypes for a specified number of individuals per population from user-provided allele frequencies. The result is returned as an `adegenet::genlight` object with population and individual metadata.

Usage

gl.sim.ind.af(df, pop.sizes)

Value

A `genlight` object with:

  • diploid SNP genotypes encoded as allele counts (0, 1, 2 for copies of the first allele),

  • `pop()` set to population names,

  • individual IDs of the form `"0_<popIndex>_<i>"`,

  • `other$ind.metrics` containing `sex` (`"m"`/`"f"`) and `phenotype` (`"control"`).

Arguments

df

A `data.frame` with **three** columns: (1) population name, (2) locus name, and (3) frequency of the first allele (numeric in \[0, 1\]). The function internally renames these to `popn`, `locus`, and `frequency`.

pop.sizes

A numeric (integer) vector of population sizes, with one element **per unique population** in `df`, in the same order as `unique(df$popn)`.

Author

Custodian: Luis Mijangos -- Post to https://groups.google.com/d/forum/dartr

Details

The input `df` must have three columns: population name, locus name, and the frequency of the first allele for that population–locus combination. For each population, the function simulates two haploid chromosomes per individual by independently drawing alleles at each locus according to the provided allele frequency, then merges the two chromosomes into diploid genotypes (0, 1, 2 copies of the first allele). The procedure assumes Hardy–Weinberg proportions and linkage equilibrium (i.e., loci are sampled independently and there is no within-population structure beyond the supplied allele frequencies).

Sex labels are assigned as "Male"/"Female" in alternating blocks (stored as factors `"m"`/`"f"` in the returned object), and a placeholder phenotype is set to `"control"` for all individuals. Locus allele labels are initialized to `"G/C"` as a placeholder. Computation of chromosomes and genotype strings is implemented with `Rcpp` for speed.

Examples

Run this code
t1 <- gl.filter.callrate(platypus.gl,threshold = 1, mono.rm = TRUE)
r1 <- gl.allele.freq(t1, by='popxloc' )
r2 <- r1[,c("popn",'locus',"frequency")]
res <- gl.sim.ind.af(df = r2, pop.sizes= c(50,50,50))

Run the code above in your browser using DataLab