MakeAgePrior: Age priors

Description

Calculate age-difference based prior probability ratios $P(A|R) / P(A)$ for various categories of pairwise relatives.

Usage

MakeAgePrior(Ped = NULL, LifeHistData = NULL, MaxAgeParent = NULL,
  Flatten = TRUE, lambdaNW = -log(0.5)/100, Smooth = TRUE,
  Plot = TRUE, Return = "LR", quiet = FALSE)

Arguments

Ped

dataframe with pedigree, with id - dam - sire in columns 1-3, and optional column with birth years. Other columns are ignored.

LifeHistData

dataframe with columns id - sex (not used) - birth year (unknown: negative number or NA). Column names are ignored, so the column order is important. "Birth year" may be in any arbitrary discrete time unit relevant to the species (day, month, decade), as long as parents are never born in the same time unit as their offspring.

MaxAgeParent

maximum age of a parent (max across dams and sires). If NULL, will be estimated from the data. If there are fewer than 100 parents of either sex assigned, MaxAgeParent is set to the maximum age difference in the birth year column of Ped or LifeHistData.

Flatten

Calculate weighed average between the observed age difference distribution among the relative pairs with known age difference and a completely flat distribution. The weights are a function of the number of pairs and lambdaNW, see Details. Advisable if the relative pairs with known age difference non-typical of the pedigree as a whole or when their number is limited, and therefore automatically set to TRUE when there are fewer than 20 parents of either sex assigned. Not advisable when generations do not overlap.

lambdaNW

When Flatten=TRUE, weighing factors of the data-estimated age-difference distribution versus a flat distribution are calculated as $W(R) = 1 - exp(-lambdaNW * N(R))$, where $N(R)$ is the number of pairs with relationship R for which the age difference is known. See Details below.

Smooth

Smooth the tails of and any dips in the distribution? Sets dips (<10% of average of neighbouring ages) to the average of the neigbouring ages, sets the age after the end (oldest observed age) to LR(end)/2, and assigns a small value (0.001) to the ages before the front (youngest observed age) and after the new end. Set to FALSE when generations do not overlap.

Plot

plot a 2-panel overview of the results?

Return

return only a matrix with the likelihood-ratio $P(A|R) / P(A)$ ("LR") or a list including also various intermediate statistics ("all") ?

quiet

suppress messages

Value

A matrix with the probability ratio of the age difference between two individuals conditional on them being a certain type of relative ($P(A|R)$) versus being a random draw from the sample ($P(A)$). For siblings and avuncular pairs, this is the absolute age difference.

The matrix has one row per age difference (0 - nAgeClasses) and nine columns, one for each relationship type, with abbreviations:

Mothers

Fathers

Full siblings

Maternal half-siblings

Paternal half-siblings

MGM

Maternal grandmother

PGF

Paternal grandfather

MGF

Maternal grandfathers and paternal grandmothers

Avuncular (aunt/uncle - niece/nephew)

When Return='all', a list is returned with in addition to this matrix ('LR.RU.A') the following elements:

BirthYearRange

vector length 2

MaxAgeParent

single number, as estimated from the data or provided

tblA.R

matrix with the counts per age difference (0 - nAgeClasses) and the five relationship types as for 'LR.RU.A', plus a column 'X' with age differences across all pairs of individuals, including those in LifeHistData but not in Ped.

Weights

vector length 4, the weights used to flatten the distributions

LR.RU.A.unweighed

matrix with nAgeClasses+1 rows and 9 columns; LR.RU.A prior to flattening and smoothing

Discrete generations

When generations do not overlap, Flatten and Smooth should both be set to FALSE. This is done automatically when it is detected that all parents are 1 year older than their offspring, and siblings are always born in the same year.

Single cohort

When no birth year information is given, or all individuals have the same birthyear, it is assumed that a single cohort has been analysed and a simple 2 by 9 matrix with 0's and 1's is returned. Note that by default it is assumed that no avuncular pairs exist within this single cohort; if they may exist, change LR.RU.A["0","UA"] from 0 to 1.

Other time units

"Birth year" may be in any arbitrary time unit relevant to the species (day, month, decade), as long as parents are never born in the same time unit as their offspring. Time of birth/hatching/germination must be expressed as whole, positive numbers.

Details

Using Bayes' theorem, $$P(Relationship | Age difference) = P(Age difference | Relationship) / P(Age difference) * P(Relationship)$$ or for short $P(R|A) =P(A|R) / P(A) * P(R)$. During pedigree reconstruction, the ratios $P(A|R) / P(A)$ calculated here are multiplied by the age-independent genetic-only $P(Relationship)$ to obtain a probability that the pair are relatives of type R conditional on both their age difference and their genotypes. This age-difference prior is used not only for pairs of genotyped individuals, but also between genotyped and dummy individuals and between pairs of dummy individuals. Therefore, $P(Age difference)$ is taken in the broadest sense possible and calculated across all pairs of individuals in Ped and LifeHistData. When P(A) is 0 but The resulting distribution is by default flattened and smoothed to account for finite sample size.

Sometimes no age information is available for one or more relative categories, for example when all candidate fathers are unsampled and/or of unknown age. To account for the amount of information available, if Flatten=TRUE a weighed average of $P(A|R)/P(A)$ as estimated from the data and a completely flat distribution is returned. The weighing factor $W(R)$ for each relationship is a function of the number of relative pairs with a known age difference $N(R)$, following the sigmoid curve $W(R) = 1 - exp(-lambdaNW * N(R))$. By default, $lambdaNW=-log(0.5)/100$, corresponding to a weight of <50% if there are <100 pairs, and >50% if there are >100 pairs. See example below for a plot of this curve.

Examples

Run this code

# NOT RUN {
data(LH_HSg5, Ped_HSg5, package="sequoia")
# }
# NOT RUN {
MakeAgePrior(Ped_HSg5, LH_HSg5, Flatten=FALSE, Smooth=FALSE)
APlist <- MakeAgePrior(Ped_HSg5, LH_HSg5, Flatten=FALSE, Smooth=TRUE,
  Return="all") 
# }
# NOT RUN {
# effect of lambdaNW on weights when Flatten=TRUE:
lambda1 <- -log(0.5)/100
lambda2 <- -log(1-.9)/100
curve(1-exp(-lambda1*x), lty=2, lwd=2, from=1, to=1000, log="x",
      xlab="N known age difference", ylab="W", las=1)
curve(1-exp(-lambda2*x), lty=2, lwd=2, col="blue", add=TRUE)
abline(h=c(0,.1,.25,.5,.75,.9,1), col="grey", lty=3)

N = c(1,10,42,100,200, 432, 1e3, 1e4)
data.frame(N = N,
           W1 = round(1-exp(-lambda1*N), 3),
           W2 = round(1-exp(-lambda2*N), 3))

# }

Run the code above in your browser using DataLab