Learn R Programming

sequoia (version 2.0.7)

CalcOHLLR: calculate OH and LLR

Description

Count opposite homozygous (OH) loci between parent-offspring pairs and Mendelian errors (ME) between parent-parent-offspring trios, and calculate the parental log-likelihood ratios (LLR).

Usage

CalcOHLLR(
  Pedigree = NULL,
  GenoM = NULL,
  CalcLLR = TRUE,
  LifeHistData = NULL,
  AgePrior = FALSE,
  Err = 1e-04,
  ErrFlavour = "version2.0",
  Tassign = 0.5,
  Complex = "full",
  GDX = TRUE,
  quiet = FALSE
)

Arguments

Pedigree

dataframe with columns id-dam-sire. May include non-genotyped individuals, which will be treated as dummy individuals.

GenoM

the genotype matrix

CalcLLR

calculate log-likelihood ratios for all assigned parents (genotyped + dummy/non-genotyped; parent vs. otherwise related). If FALSE, only number of mismatching SNPs are counted (OH & ME), and parameters LifeHistData, AgePrior, Err, Tassign, and Complex are ignored. Note also that calculating likelihood ratios is much more time consuming than counting OH & ME.

LifeHistData

Dataframe with columns ID - Sex - BirthYear, and optionally columns BY.min and BY.max. If provided, used to delimit possible alternative relationships.

AgePrior

logical (TRUE/FALSE) to estimate the ageprior from Pedigree and LifeHistData, or an agepriors matrix (see MakeAgePrior). Affects which alternative relationships are considered (only those where \(P(A|R) / P(A) > 0\)). When TRUE, MakeAgePrior is called using its default values.

Err

estimated genotyping error rate, as a single number or 3x3 matrix. If a matrix, this should be the probability of observed genotype (columns) conditional on actual genotype (rows). Each row must therefore sum to 1.

ErrFlavour

function that takes Err as input, and returns a 3x3 matrix of observed (columns) conditional on actual (rows) genotypes, or choose from inbuilt ones as used in sequoia 'version2.0', 'version1.3', or 'version1.1'. Ignored if Err is a matrix. See ErrToM.

Tassign

used to determine whether or not to consider some more exotic relationships when Complex="full".

Complex

determines which relationships are considered as alternatives. Either "full" (default), "simp" (simplified, ignores inbred relationships), or "mono" (monogamous).

GDX

call getAssignCat to classify individuals as genotyped (G), substitutable by a dummy (D) or neither (X).

quiet

logical, suppress messages

Value

the Pedigree dataframe with additional columns:

LLRdam

Log10-Likelihood Ratio (LLR) of this female being the mother, versus the next most likely relationship between the focal individual and this female (see Details for relationships considered)

LLRsire

idem, for male parent

LLRpair

LLR for the parental pair, versus the next most likely configuration between the three individuals (with one or neither parent assigned)

OHdam

Number of loci at which the offspring and mother are opposite homozygotes

OHsire

idem, for father

MEpair

Number of Mendelian errors between the offspring and the parent pair, includes OH as well as e.g. parents being opposing homozygotes, but the offspring not being a heterozygote. The offspring being OH with both parents is counted as 2 errors.

SNPd.id.dam

Number of SNPs scored (non-missing) for both individual and dam

SNPd.id.sire

Number of SNPs scored for both individual and sire

id.cat

Character denoting whether the focal individual is genotyped (G), substitutable by a dummy (D), or neither (X).

dam.cat

as id.cat, for dams. If id.cat and/or dam.cat is 'X', the dam cannot be assigned.

sire.cat

as dam.cat, for sires

Sexx

Sex in LifeHistData, or inferred Sex when assigned as part of parent-pair

BY.est

mode of birth year probability distribution

BY.lo

lower limit of 95% highest density region of birth year probability distribution

BY.hi

higher limit

The columns 'LLRdam', 'LLRsire' and 'LLRpair' are only included when CalcLLR=TRUE. The columns 'dam.cat' and 'sire.cat' are only included when GDX=TRUE. The columns 'Sexx', 'BY.est', 'BY.lo' and 'BY.hi' are only included when LifeHistData is provided, and at least one genotyped individual has an unknown birthyear or unknown sex.

Details

Any individuals in Pedigree that do not occur in GenoM are substituted by dummy individuals; a value of '0' in column 'SNPd.id.dam' in the output means that either the focal individual or the dam was thus substituted, or both were. Use getAssignCat to distinguish between these cases.

The birth years in LifeHistData and the AgePrior are not used in the calculation and do not affect the value of the likelihoods for the various relationships, but they _are_ used during some filtering steps, and may therefore affect the likelihood _ratio_. The default (AgePrior=FALSE) assumes all age-relationship combinations are possible, which may mean that some additional alternatives are considered compared to the sequoia default, resulting in somewhat lower LLR values.

A negative LLR for A's parent B indicates either that B is not truely the parent of A, or that B's parents are incorrect. The latter may cause B's presumed true, unobserved genotype to greatly divert from its observed genotype, with downstream consequences for its offspring. In rare cases it may also be due to 'weird', non-implemented double or triple relationships between A and B.

See Also

SummarySeq for visualisation of OH & LLR distributions; GenoConvert to read in various genotype data formats, CheckGeno; PedPolish to check and 'polish' the pedigree; getAssignCat to find which id-parent pairs are both genotyped or can be substituted by dummy individuals; sequoia for pedigree reconstruction

Examples

Run this code
# NOT RUN {
# have a quick look for errors in an existing pedigree,
# without running pedigree reconstruction
PedA <- CalcOHLLR(Pedigree = MyOldPedigree, GenoM = MyNewGenotypes,
  CalcLLR=FALSE)

# or run sequoia with CalcLLR=FALSE, and add OH + LLR later
SeqOUT <- sequoia(Genotypes, LifeHist, CalcLLR=FALSE)
PedA <- CalcOHLLR(Pedigree = SeqoUT$Pedigree[, 1:3], GenoM = Genotypes,
  LifeHistData = LIfeHist, AgePrior = TRUE, Complex = "full")

# visualise
SummarySeq(PedA, Panels=c("LLR", "OH"))
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab