MakeAgePrior: Age priors

Description

For various categories of pairwise relatives (R), calculate age-difference (A) based probability ratios \(P(A|R) / P(A)\) .

Usage

MakeAgePrior(
  Pedigree = NULL,
  LifeHistData = NULL,
  MaxAgeParent = NULL,
  Discrete = NULL,
  Flatten = NULL,
  lambdaNW = -log(0.5)/100,
  Smooth = TRUE,
  Plot = TRUE,
  Return = "LR",
  quiet = FALSE
)

Arguments

Pedigree

dataframe with id - dam - sire in columns 1-3, and optional column with birth years. Other columns are ignored.

LifeHistData

dataframe with 3 or 5 columns: id - sex (not used) - birth year (- BY.min - BY.max), with unknown birth years coded as negative numbers or NA. Column names are ignored, so the column order is important. "Birth year" may be in any arbitrary discrete time unit relevant to the species (day, month, decade), as long as parents are never born in the same time unit as their offspring. It may include individuals not in the pedigree, and not all individuals in the pedigree need to be in LifeHistData.

MaxAgeParent

maximum age of a parent, a single number (max across dams and sires) or a vector of length two (dams, sires). If NULL, it will be estimated from the data. If there are fewer than 20 parents of either sex assigned, MaxAgeParent is set to the maximum age difference in the birth year column of Pedigree or LifeHistData.

Discrete

Discrete generations? By default (NULL), discrete generations are assumed if all parent-offspring pairs have an age difference of 1, and all siblings an age difference of 0, and there are at least 20 pairs of each category (mother, father, maternal sibling, paternal sibling). Otherwise, overlapping generations are presumed. When Discrete=TRUE (explicitly or deduced), Smooth and Flatten are always automatically set to FALSE. Use Discrete=FALSE to enforce (potential for) overlapping generations.

Flatten

To deal with small sample sizes for some or all relationships, calculate weighed average between the observed age difference distribution among relatives and a flat (0/1) distribution. When Flatten=NULL (the default) automatically set to TRUE when there are fewer than 20 parents with known age of either sex assigned, or fewer than 20 maternal or paternal siblings with known age difference. Also advisable if the sampled relative pairs with known age difference are non-typical of the pedigree as a whole.

lambdaNW

Control weighing factors when Flatten=TRUE. Weights are calculated as \(W(R) = 1 - exp(-lambdaNW * N(R))\), where \(N(R)\) is the number of pairs with relationship R for which the age difference is known. Large values (>0.2) put strong emphasis on the pedigree, small values (<0.0001) cause the pedigree to be ignored. Default results in \(W=0.5\) for \(N=100\).

Smooth

Smooth the tails of and any dips in the distribution? Sets dips (<10% of average of neighbouring ages) to the average of the neighbouring ages, sets the age after the end (oldest observed age) to LR(end)/2, and assigns a small value (0.001) to the ages before the front (youngest observed age) and after the new end. Peaks are not smoothed out, as these are less likely to cause problems than dips, and are more likely to be genuine characteristics of the species. Is set to FALSE when generations do not overlap (Discrete=TRUE).

Plot

plot a heatmap of the results? Only when Pedigree is provided

Return

return only a matrix with the likelihood-ratio \(P(A|R) / P(A)\) ("LR") or a list including also various intermediate statistics ("all") ?

quiet

suppress messages

Value

A matrix with the probability ratio of the age difference between two individuals conditional on them being a certain type of relative (\(P(A|R)\)) versus being a random draw from the sample (\(P(A)\)). For siblings and avuncular pairs, this is the absolute age difference.

The matrix has one row per age difference (0 - nAgeClasses) and five columns, one for each relationship type, with abbreviations:

Mothers

Fathers

Full siblings

Maternal half-siblings

Paternal half-siblings

When Return='all', a list is returned with in addition to this matrix ('LR.RU.A') the following elements:

BirthYearRange

vector length 2

MaxAgeParent

single number, estimated from the data or provided

tblA.R

matrix with the counts per age difference (0 - nAgeClasses) and the five relationship types as for 'LR.RU.A', plus a column 'X' with age differences across all pairs of individuals, including those in LifeHistData but not in Pedigree.

Weights

vector length 4, the weights used to flatten the distributions

LR.RU.A.unweighed

matrix with nAgeClasses+1 rows and 5 columns; LR.RU.A prior to flattening and smoothing

Specs.AP

the names of the input Pedigree and LifeHistData (or NULL), the 'effective' settings of Discrete, Smooth, and Flatten, and the value of lambdaNW

CAUTION

The small sample correction with Smooth and/or Flatten prevents errors in one dataset, but may introduce errors in another; a single solution that fits to the wide variety of life histories and datasets is impossible. Please do inspect the matrix, e.g. with PlotAgePrior.

Single cohort

When no birth year information is given, or all individuals have the same birth year, it is assumed that a single cohort has been analysed and a matrix with 0's and 1's is returned. When Discrete=FALSE, avuncular pairs are assumed potentially present, while when Discrete=TRUE avuncular is not considered as a relationship possibility.

Other time units

"Birth year" may be in any arbitrary time unit relevant to the species (day, month, decade), as long as parents are never born in the same time unit as their offspring, but always before their putative offspring (e.g. parent's BirthYear= 1 (or 2001) and offspring BirthYear=5 (or 2005)). Negative numbers and NA's are interpreted as unknown, and fractional numbers are not allowed.

Maximum parental age

The number of rows in the output ageprior matrix equals the maximum parental age +1 (the first row is for age difference 0). The maximum parental age equals:

the maximum age of parents if a pedigree is provided, or
the (largest) value of MaxAgeParent, or
1, if generations are discrete, or
the maximum range of birth years in LifeHistData (including BY.min and BY.max, when provided)

Exception is when MaxAgeParent is larger than the maximum age of parents in the provided skeleton pedigree, then MaxAgeParent is used. Thus, MaxAgeParent can be used when the birth year range in LifeHistData and/or the age distribution of assigned parents does not capture the absolutely maximum age of parents. Not adjusting this may hinder subsequent assignment of both dummy parents and grandparents.

Details

The ratio \(P(A|R) / P(A)\) is the ratio between the observed counts of pairs with age difference A and relationship R (\(N_{A,R}\)), and the expected counts if age and relationship were independent (\(N_{.,.}*p_A*p_R\)).

During pedigree reconstruction, the ratios \(P(A|R) / P(A)\) calculated here are multiplied by the age-independent genetic-only \(P(R|G)\) to obtain a probability that the pair are relatives of type R conditional on both their age difference and their genotypes (i.e. using Bayes' theorem, \(P(R|A, G) =P(A|R) / P(A) * P(R|G)\)).

The age-difference prior is used for pairs of genotyped individuals, as well as for dummy individuals. This assumes that the propensity for a pair with a given age difference to both be sampled does not depend on their relationship, so that the ratio \(P(A|R) / P(A)\) does not differ between sampled and unsampled pairs.

Examples

Run this code

# NOT RUN {
data(LH_HSg5, Ped_HSg5, package="sequoia")

# no pedigree available:
MakeAgePrior(LifeHistData = LH_HSg5)
MakeAgePrior(LifeHistData = LH_HSg5, Discrete=TRUE)
MakeAgePrior(LifeHistData = LH_HSg5, MaxAgeParent = c(2,3))
# }
# NOT RUN {
# with pedigree:
MakeAgePrior(Pedigree=Ped_HSg5[1:100,], LifeHistData = LH_HSg5)
MakeAgePrior(Ped_HSg5[1:100,], LH_HSg5, Discrete=FALSE)
# With 'Flatten', the value depens on the no. pairs per relationship:
MakeAgePrior(Ped_HSg5[1:100,], LH_HSg5, Flatten=TRUE)
AP.all <- MakeAgePrior(Ped_HSg5[1:200,], LH_HSg5, Flatten=TRUE)
AP.all$tblA.R
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab