genotype_probabilities: Calculate genotype probabilities for a target person

Description

For a chosen individual within a specified family, calculate the person's conditional genotype probabilities, given the family's phenotypes and relationship structure

Usage

genotype_probabilities(target, fam, geno_freq, trans, penet, monozyg = NULL)

Value

A vector of length length(geno_freq), whose jth element is the conditional probability that the target person has genotype j, given the family's relationship structure and phenotypes. A vector of NAs will be returned if a row of penet consists entirely of zeroes or if the pedigree is impossible for any other reason (after restricting fam and penet to the connected component of the pedigree containing target).

Arguments

target

The individual identifier (an element of fam$indiv) of the person in the pedigree fam whose genotype probabilities are being sought.

fam

A data frame specifying the family's relationship structure, with rows corresponding to people and columns corresponding to the following variables (other variables can be included but will be ignored), which will be coerced to character type:

indiv, an identifier for each individual person, with no duplicates in fam.
mother, the individual identifier of each person's mother, or missing (NA) for founders.
father, the individual identifier of each person's father, or missing (NA) for founders.

geno_freq

A vector of strictly positive numbers that sum to 1. If the possible genotypes of the underlying genetic model are 1:length(geno_freq) then geno_freq[j] is interpreted as the population frequency of genotype j. For certain genetic models that often occur in applications, these genotype frequencies can be calculated by geno_freq_monogenic, geno_freq_phased, etc.

trans

An ngeno^2 by ngeno matrix of non-negative numbers whose rows all sum to 1, where ngeno = length(geno_freq) is the number of possible genotypes. The rows of trans correspond to joint parental genotypes and the columns correspond to offspring genotypes. If the possible genotypes are 1:length(geno_freq) then the element trans[ngeno * gm + gf - ngeno, go] is interpreted as the conditional probability that a person has genotype go, given that his or her biological mother and father have genotypes gm and gf, respectively. For certain genetic models that often occur in applications, this transmission matrix can be calculated by trans_monogenic, trans_phased, etc.

penet

An nrow(fam) by length(geno_freq) matrix of non-negative numbers. The element penet[i,j] is interpreted as the conditional probability (or probability density) of the phenotype of the person corresponding to row i of fam, given that his or her genotype is j (where the possible genotypes are 1:length(geno_freq)). Note that genotype data can be incorporated into penet by regarding observed genotypes as part of the phenotype, i.e. by regarding observed genotypes as (possibly noisy) measurements of the underlying true genotypes. For example, if the observed genotype of person i is 1 (and if genotype measurement error is negligible) then penet[i,j] should be 0 for j != 1 and penet[i,1] should be the same as if person i were ungenotyped.

monozyg

An optional list that can be used to specify genetically identical persons, such as monozygotic twins, monozygotic triplets, a monozygotic pair within a set of dizygotic triplets, etc. Each element of the list should be a vector containing the individual identifiers of a group of genetically identical persons, e.g. if fam contains a set of monozygotic twins (and no other genetically identical persons) then monozyg will be a list with one element, and that element will be a vector of length two containing the individual identifiers of the twins. The order of the list and the orders of its elements do not affect the output of the function. Each group of genetically identical persons should contain two or more persons, the groups should not overlap, and all persons in each group must have the same (non-missing) parents.

Details

The genotype probabilities are calculated by essentially the same algorithm as pedigree_loglikelihood; see there for details. The genotype probabilities only depend on the connected component of the pedigree that contains target, so the function first restricts fam and penet to the rows corresponding to this connected component. For example, if fam is the union of two unrelated families then this function will restrict to the subfamily containing target before performing the calculation.

Examples

Run this code

# Read in some sample data
data("dat_small", "penet_small")
str(dat_small)
str(penet_small)

# Calculate the genotype probabilities for individual "ora008" in the family "ora"
w <- which(dat_small$family == "ora")
fam <- dat_small[w, -1]
penet <- penet_small[w, ]
monozyg <- list(c("ora024", "ora027"))  # ora024 and ora027 are identical twins
trans <- trans_monogenic(2)
geno_freq <- geno_freq_monogenic(p_alleles = c(0.9, 0.1))
genotype_probabilities(target = "ora008", fam, geno_freq, trans, penet, monozyg)

Run the code above in your browser using DataLab