prepare.data(admix.gen=NULL, loci.data=NULL, parental1=NULL, parental2=NULL, pop.id=TRUE, ind.id=TRUE, fixed=FALSE, sep.rows=FALSE, sep.columns=FALSE)
fixed=FALSE
, a single character if fixed = TRUE
.fixed=FALSE
, a single character if fixed=TRUE
.admix.gen
includes a
row specifying sampling localities.admix.gen
includes a
row specifying individual identifications.pop.id
and ind.id
data if they were supplied.NULL
if fixed = TRUE
, otherwise this
provides the allelic class data needed for genomic.clines
.Parental1.allele.freq
and
Parental2.allele.freq
for each locus. Genotypic data for individuals are provided in admix.gen
, a
data object with genotypes for each individual at each locus in the
format `A/D' or `110/114' for co-dominant data, `A' or `hap1b' for
haploid data, and `0' or `1' for dominant data. In other words, for
co-dominant and haploid data alleles can be encoded by any simple
character string. Each row should contain data for a locus and columns
should correspond to individuals. Missing data should be entered as
`NA/NA' or `NA' for co-dominant and haploid / dominant data,
respectively.
Alternatively, in admix.gen
genotypic data for an
individual can be split between two rows (sep.rows = TRUE
) or two
columns (sep.columns = TRUE
). These options are similar to those
of the data format for the program structure (Pritchard et
al. 2000, Falush et al. 2003), with the difference
that admix.gen
is transposed relative to the input for
structure. Thus, after reading in a structure file, the
data matrix can be transposed with rawdata <- t(rawdata)
before
passing the matrix to prepare.data
. If genotype data are split
across columns or rows, and they include haploid or dominant markers,
the second allele for these markers should be recorded as NA
.
If pop.id = TRUE
and ind.id = TRUE
the first row of
admix.gen
should give the population identification
(i.e. sampling locality) of each individual and the second row should
provide a unique individual identification; genotype information would
then begin on row three.
loci.data
is a matrix or array data object where each row
provides information on one locus. The first column gives a unique locus
name (e.g. "locus3"), and the second column specifies whether the
locus is co-dominant ("C" or "c"), haploid ("H" or
"h"), or dominant ("D" or "d"). These first two
columns in loci.data
are required. The third column, which is
optional, is a numeric value specifying the linkage groups for the
marker. If present, this column is used in the mk.image
function
for plotting. The fourth column, which is also optional, is a numeric
value specifying both the linkage group and location on the linkage
group (e.g. 3.70, for a marker at 70 cM on linkage group 3). This
last column could be used to generate a different order in which to
utilize marker data from admix.gen
in other functions in the package
(specified in the marker.order
argument to mk.image
and
clines.plot
). Each column in loci.data
should have a
heading (the second column should be named "type").
If the parental populations exhibit fixed differences for all markers
scored (i.e. fixed = TRUE
) then parental1
and
parental2
should give the character used to specify alleles
derived from parental populations one and two, respectively
(e.g. parental1 = "p1"
and parental2 = "p2"
). If parental
populations exhibit fixed differences at all loci, the count matrix
produced by prepare.data
is simply a count of the number of
alleles inherited from parental population 1 for each individual at each
locus (0, 1, or 2 for co-dominant marker data; 0 or 1 for dominant or
haploid marker data).
If the parental populations do not exhibit fixed differences at all loci
scored (i.e. fixed = FALSE
) then parental1
and
parental2
should be matrix data objects providing genotype data
for individuals sampled from each of the parental populations. These
data objects should be in the same format as the genotype.data
data object, with the difference that they should not contain rows for
individual and population identifications at the top.
prepare.data
uses the parental data objects to calculate allele
frequencies at each locus for both of the parental populations. Alleles
are then binned into allelic classes with maximum (equal to the
observed) frequency differentials between parental populations
($\delta$, Gregorius and Roberds 1986). These allelic classes serve
as the basis for estimating the count matrix, which is in the same
format as described above. In the absence of fixed differences the
counts are of alleles from the allelic class associated with population
1 and the frequency of allelic classes in the parental species can be
used to account for uncertainty in the ancestry of particular alleles.
See Gompert and Buerkle (2009a, 2009b) for additional details.
Falush D., Stephens M., and Pritchard J. K. (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics, 164, 1567-1587. Gompert Z. and Buerkle C. A. (2009) A powerful regression-based method for admixture mapping of isolation across the genome of hybrids. Molecular Ecology, 18, 1207-1224.
Gompert Z. and Buerkle C. A. (2009) introgress: a software package for mapping components of isolation in hybrids. Molecular Ecology Resources, in preparation.
Gregorius H. R. and Roberds J. H. (1986) Measurement of genetical differentiation among subpopulations. Theoretical and Applied Genetics, 71, 826-834.
Pritchard J. K., Stephens M., and Donnelly P. (2000) Inference of population structure using multilocus genotype data. Genetics, 155, 945-959.
delta
, mk.image
, genomic.clines
, clines.plot
## Not run:
# ## load simulated data
# ## markers have fixed differences, with
# ## alleles coded as 'P1' and 'P2'
# data(AdmixDataSim1)
# data(LociDataSim1)
#
# ## use prepare.data to produce introgress.data
# introgress.data<-prepare.data(admix.gen=AdmixDataSim1,
# loci.data=LociDataSim1,
# parental1="P1", parental2="P2",
# pop.id=FALSE, ind.id=FALSE, fixed=TRUE)
#
# ## End(Not run)
Run the code above in your browser using DataLab