getXlist: Design Matrices for the Multinomial Log-Linear Model

Description

Forms design matrices for each offspring, and stores other relevant information.

Usage

getXlist(PdP, GdP=NULL, A=NULL, E1=0.005, E2=0.005, mm.tol=999)

Arguments

PdP

PdataPed object

GdP

optional GdataPed object

optional list of allele frequencies. If not specified and GdP exists, allele frequencies are taken from GdP$G using extractA

if Wang's (2004) model of genotyping error for co-dominant markers is used this is the probability of an allele dropping out. If CERVUS's (Kalinowski, 2006; Marshall, 1998) model of genotyping error for co-dominant markers is used this parameter is not used. If Hadfield's (2009) model of genotyping error for dominant markers is used this is the probability of a dominant allele being scored as a recessive allele.

if Wang's (2004) or CERVUS's (Kalinowski, 2006; Marshall, 1998) model of genotyping error for co-dominant markers are used this is the probability of an allele being miss-scored. In the CERVUS model errors are not independent for the two alleles within a genotype and so if a genotyping error has occurred at one allele then a genotyping error occurs at the other allele with probability one. Accordingly, E2(2-E2) is the per-genotype rate defined in CERVUS. If Hadfield's (2009) model of genotyping error for dominant markers is used this is the probability of a recessive allele being scored as a dominant allele.

mm.tol

maximum number of genotype mismatches tolerated for potential parents

Value

vector of unique identifiers taken from PdP

beta_map

index relating the vector of unique parameters to the columns of the design matrices

list of design matrices and other information.

Details

This is the main R routine for setting up design matrices for the various models that may be defined in the formula argument of PdataPed. If a GdataPed object is passed to getXlist design matrices of genetic likelihoods are calculated (see fillX.G), and the number of mismatches between offspring and parental genotypes are stored (see mismatches). mm.tol specifies the maximum number of mismatches that are tolerated between an offspring and a parent. Parents that exceed this number of mismatches are excluded, and the design matrices for non-excluded parents are reordered by the number of mismatches. This increases the efficiency of sampling from the multinomial distribution of parents, because high probability parents appear first.

References

Hadfield J.D. et al (2006) Molecular Ecology 15 3715-31 Kalinowski S.T. et al (2006) Molecular Ecology in press Hadfield J. D. et al (2007) in prep

Examples

Run this code

# NOT RUN {
id<-1:20
sex<-sample(c("Male", "Female"),20, replace=TRUE)
offspring<-c(rep(0,18),1,1)
lat<-rnorm(20)
long<-rnorm(20)
mating_type<-gl(2,10, label=c("+", "-"))

test.data<-data.frame(id, offspring, lat, long, mating_type, sex)

res1<-expression(varPed("offspring", restrict=0))
var1<-expression(varPed(c("lat", "long"), gender="Male", 
  relational="OFFSPRING"))
var2<-expression(varPed(c("mating_type"), gender="Female", 
  relational="MATE"))
var3<-expression(varPed("mating_type", gender="Male"))

PdP<-PdataPed(formula=list(res1, var1, var2, var3), data=test.data)

X.list<-getXlist(PdP)
X.list$X$"19"$XSs

# For the first offspring we have the design matrix for sires
# The first column represents the distance between each male 
# and each offspring. The second column indicates the male's 
# mating type. Note that contrasts are set up with the first 
# male so the indicator variables may be negative.

matrix(X.list$X$"19"$XDSs, ncol=length(X.list$X$"19"$dam.id), 
   nrow=length(X.list$X$"19"$sire.id))

# incidence matrix indicating whether Females (columns) and Males (rows)
# are the same mating type. Again this is a contrast with the first 
# parental combination (which is +/+) so 0 actually represents parents
# with the same mating type.
# }

Run the code above in your browser using DataLab