genotype_combinations: Calculate genotype probabilities

Description

This function computes the joint probabilities of the possible genotypes of selected family members within each family, as an intermediate calculation for use in ML_estimates.

Usage

genotype_combinations(dat, ncores = 1, verbose = TRUE)

Arguments

dat

A data frame with rows corresponding to people and columns corresponding to (at least) the following variables, which will be coerced to character type:

family (family ID), an identifier for each person's family, constant within families
indiv (individual ID), an identifier for each person, with no duplicates across the dataset
mother (mother ID), the individual ID of each person's mother, or missing (NA) for founders
father (father ID), the individual ID of each person's father, or missing (NA) for founders
typed (epi-genotyped), equal to 1 for people with methylation data and 0 for all others

ncores

The number of cores to be used, with ncores = 1 (the default) corresponding to non-parallel computing. When ncores > 1, the parallel package is used to parallelize the calculation of the joint probabilities of the possible genotype combinations.

verbose

FALSE if user wants to suppress messages. Default is TRUE.

Value

A named list, with names equal to the different family IDs and with each element of the list being a data frame specifying the possible genotypes of selected family members (those with dat$typed = 1) within each family, and the joint probability of each genotype combination. For each individual, the possible genotypes are 0, corresponding to the wildtype, and 1, for carriers of a rare genetic variant at an autosomal locus. The calculation makes the same assumptions as in (Joo et al., 2018), including that the genetic variant is so rare that at most one founder is a carrier.

Details

Currently, there is a maximum of 20 people per family with typed = 1, due to the need to store all genotype combinations for the typed people. It is possible to break families with more than 20 typed people into smaller families, though this is not ideal.

Each family within dat should be a complete pedigree, meaning that each (non-missing) mother or father ID should correspond to a row, and each person should either have both parent IDs missing (if a founder) or non-missing (if a non-founder). No family should contain a pedigree loop, such as those caused by inbreeding or by two sisters having children with two brothers from an unrelated family.

References

Joo JE, Dowty JG, Milne RL, Wong EM, Dugu<U+00E9> PA, English D, Hopper JL, Goldgar DE, Giles GG, Southey MC, kConFab. Heritable DNA methylation marks associated with susceptibility to breast cancer. Nat Commun. 2018 Feb 28;9(1):867. https://doi.org/10.1038/s41467-018-03058-6

Examples

Run this code

# NOT RUN {
# Load family data
data(ped)

# Calculate genotype probabilites
typed_genos <- genotype_combinations(ped)
str(typed_genos)

# }

Run the code above in your browser using DataLab