Learn R Programming

sequoia (version 2.3.5)

GetMaybeRel: Find Putative Relatives

Description

Identify pairs of individuals likely to be related, but not assigned as such in the provided pedigree.

Usage

GetMaybeRel(
  GenoM = NULL,
  SeqList = NULL,
  Pedigree = NULL,
  LifeHistData = NULL,
  AgePrior = NULL,
  ParSib = NULL,
  Module = "par",
  Complex = "full",
  Herm = "no",
  Err = 1e-04,
  ErrFlavour = "version2.0",
  MaxMismatch = NA,
  Tassign = 0.5,
  Tfilter = -2,
  MaxPairs = 7 * nrow(GenoM),
  quiet = FALSE
)

Arguments

GenoM

numeric matrix with genotype data: One row per individual, and one column per SNP, coded as 0, 1, 2 or -9 (missing). See also GenoConvert.

SeqList

list with output from sequoia. SeqList$Pedigree is used if present, and SeqList$PedigreePar otherwise, and overrides the input parameter Pedigree. If 'Specs' is present, its elements override all input parameters with the same name. The list elements `LifeHist', `AgePriors', and `ErrM' are also used if present, and similarly override the corresponding input parameters.

Pedigree

dataframe with id - dam - sire in columns 1-3. May include non-genotyped individuals, which will be treated as dummy individuals. When provided, all likelihoods (and thus all maybe-relatives) are conditional on this pedigree. Note: SeqList$Pedigree or SeqList$PedigreePar take precedent (for this function only).

LifeHistData

dataframe with 3 columns (optionally 5):

ID

max. 30 characters long

Sex

1 = female, 2 = male, 3 = unknown, 4 = hermaphrodite, other numbers or NA = unknown

BirthYear

birth or hatching year, integer, with missing values as NA or any negative value.

BY.min

minimum birth year, only used if BirthYear is missing

BY.max

maximum birth year, only used if BirthYear is missing

If the species has multiple generations per year, use an integer coding such that the candidate parents' `Birth year' is at least one smaller than their putative offspring's. Column names are ignored, so ensure column order is ID - sex - birth year (- BY.min - BY.max). Individuals do not need to be in the same order as in `GenoM', nor do all genotyped individuals need to be included.

AgePrior

Agepriors matrix, as generated by MakeAgePrior and included in the sequoia output. Affects which relationships are considered possible (only those where \(P(A|R) / P(A) > 0\)).

ParSib

either 'par' to check for putative parent-offspring pairs only, or 'sib' to check for all types of first and second degree relatives. This argument will be deprecated, please use Module.

Module

type of relatives to check for. One of

par

parent - offspring pairs

ped

all first and second degree relatives

When 'par', all pairs are returned that are more likely parent-offspring than unrelated, potentially including pairs that are even more likely to be otherwise related.

Complex

Breeding system complexity. Either "full" (default), "simp" (simplified, no explicit consideration of inbred relationships), "mono" (monogamous).

Herm

Hermaphrodites, either "no", "A" (distinguish between dam and sire role, default if at least 1 individual with sex=4), or "B" (no distinction between dam and sire role). Both of the latter deal with selfing.

Err

estimated genotyping error rate, as a single number or 3x3 matrix. Details below. The error rate is presumed constant across SNPs, and missingness is presumed random with respect to actual genotype.

ErrFlavour

function that takes Err (single number) as input, and returns a 3x3 matrix of observed (columns) conditional on actual (rows) genotypes, or choose from inbuilt options 'version2.0', 'version1.3', or 'version1.1', referring to the sequoia version in which they were the default. Ignored if Err is a matrix. See ErrToM.

MaxMismatch

DEPRECATED AND IGNORED. Now calculated automatically using CalcMaxMismatch.

Tassign

minimum LLR required for acceptance of proposed relationship, relative to next most likely relationship. Higher values result in more conservative assignments. Must be zero or positive.

Tfilter

threshold log10-likelihood ratio (LLR) between a proposed relationship versus unrelated, to select candidate relatives. Typically a negative value, related to the fact that unconditional likelihoods are calculated during the filtering steps. More negative values may decrease non-assignment, but will increase computational time.

MaxPairs

the maximum number of putative pairs to return.

quiet

logical, suppress messages.

Value

A list with

MaybePar or MaybeRel

A dataframe with non-assigned likely relatives, with columns ID1 - ID2 - TopRel - LLR - OH - BirthYear1 - BirthYear2 - AgeDif - Sex1 - Sex2 - SNPdBoth

MaybeTrio

A dataframe with non-assigned parent-parent-offspring trios, with columns id - parent1 - parent2 - LLRparent1 - LLRparent2 - LLRpair - OHparent1 - OHparent2 - MEpair - SNPd.id.parent1 - SNPd.id.parent2

The following categories are used in column 'TopRel', indicating the most likely relationship category:
PO

Parent-Offspring

FS

Full Siblings

HS

Half Siblings

GP

GrandParent - grand-offspring

FA

Full Avuncular (aunt/uncle)

2nd

2nd degree relatives, not enough information to distinguish between HS,GP and FA

Q

Unclear, but probably 1st, 2nd or 3rd degree relatives

Details

When Module="par", the age difference of the putative pair is temporarily set to NA so that genetic parent-offspring pairs declared to be born in the same year may be discovered. When Module="ped", only relationships possible given the age difference, if known from the LifeHistData, are considered.

See Also

sequoia to identify likely pairs of duplicate genotypes and for pedigree reconstruction; GetRelM to identify all pairs of relatives in a pedigree; CalcPairLL for the likelihoods underlying the LLR.

Examples

Run this code
# NOT RUN {
SeqOUT <- sequoia(GenoM = SimGeno_example,
                  LifeHistData = LH_HSg5,
                  Module = "par",
                  quiet=TRUE, Plot=FALSE)
MaybePO <- GetMaybeRel(GenoM = SimGeno_example,
                      SeqList = SeqOUT)
head(MaybePO$MaybePar)

# instead of providing the entire SeqList, one may
# specify the relevant elements separately
Maybe <- GetMaybeRel(GenoM = SimGeno_example,
                     Pedigree = SeqOUT$PedigreePar,
                     LifeHistData = LH_HSg5,
                     Err=0.0001, Complex = "full",
                     Module = "ped")
head(Maybe$MaybeRel)

# visualise results, turn dataframe into matrix first:
MaybeM <- GetRelM(Pairs=Maybe$MaybeRel)
PlotRelPairs(MaybeM)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab