sim_RVped
simulates a pedigree ascertained to contain multiple affected members, selects a proband, and trims the pedigree to contain only those individuals that are recalled by the proband.
sim_RVped(hazard_rates, GRR, num_affected, ascertain_span, FamID,
founder_byears, stop_year = NULL, recall_probs = NULL,
carrier_prob = 0.002, RVfounder = FALSE, NB_params = c(2, 4/7),
fert = 1, first_diagnosis = NULL, birth_range = NULL)
An object of class hazard
, created by hazard
.
Numeric. The genetic relative-risk of disease, i.e. the relative-risk of disease for individuals who carry at least one copy of the causal variant.
Numeric. The minimum number of affected individuals in the pedigree.
Numeric vector of length 2. The year span of the ascertainment period. This period represents the range of years during which the proband developed disease and the family would have been ascertained for multiple affected relatives.
Numeric. The family ID to assign to the simulated pedigree.
Numeric vector of length 2. The span of years from which to simulate, uniformly, the birth year for the founder who introduced the rare variant to the pedigree.
Numeric. The last year of study. If not supplied, defaults to the current year.
Numeric. The proband's recall probabilities for relatives, see details. If not supplied, the default value of four times kinship coefficient between the proband and the relative is used.
Numeric. The carrier probability for all causal variants with relative-risk of disease GRR
. By default, carrier_prob
= 0.002
Logical. Indicates if all pedigrees segregate the rare, causal variant. By default, RVfounder = FALSE
See details.
Numeric vector of length 2. The size and probability parameters of the negative binomial distribution used to model the number of children per household. By default, NB_params
= c(2, 4/7)
, due to the investigation of Kojima and Kelleher (1962).
Numeric. A constant used to rescale the fertility rate after disease-onset. By default, fert = 1
.
Numeric. The first year that reliable diagnoses can be obtained regarding disease-affection status. By default, first_diagnosis
= NULL
so that all diagnoses are considered reliable. See details.
This argument is depreciated.
A list containing the following data frames:
full_ped
The full pedigree, prior to proband selection and trimming.
ascertained_ped
The ascertained pedigree, with proband selected and trimmed according to proband recall probability. See details.
When RV_founder = TRUE
, all simulated pedigrees will segregate a genetic susceptibility variant. In this scenario, we assume that the variant is rare enough that it has been introduced by one founder, and we begin the simulation of the pedigree with this founder. Alternatively, when RV_founder = FALSE
we simulate the starting founder's causal variant status with probability carrier_prob
. When RV_founder = FALSE
pedigrees may not segregate the genetic susceptibility variant. The default selection is RV_founder = FALSE
. Additionally, we note that sim_RVpedigree
is intended for rare causal variants; users will recieve a warning if carrier_prob > 0.002
.
We note that when GRR = 1
, pedigrees do not segregate the causal variant regardless of the setting selected for RVfounder
. When the causal variant is introduced to the pedigree we transmit it from parent to offspring according to Mendel's laws.
We begin simulating the pedigree by generating the year of birth, uniformly, between the years specified in founder_byears
for the starting founder. Next, we simulate this founder's life events using the sim_life
function, and censor any events that occur after the study stop_year
. Possible life events include: reproduction, disease onset, and death. We continue simulating life events for any offspring, censoring events which occur after the study stop year, until the simulation process terminates. We do not simulate life events for marry-ins, i.e. individuals who mate with either the starting founder or offspring of the starting founder.
We do not model disease remission. Rather, we impose the restriction that individuals may only experience disease onset once, and remain affected from that point on. If disease onset occurs then we apply the hazard rate for death in the affected population.
sim_RVped
will only return ascertained pedigrees with at least num_affected
affected individuals. That is, if a simulated pedigree does not contain at least num_affected
affected individuals sim_RVped
will discard the pedigree and simulate another until the condition is met. We note that even for num_affected = 2
, sim_RVped
can be computationally expensive. To simulate a pedigree with no proband, and without a minimum number of affected members use instead sim_ped
.
Upon simulating a pedigree with num_affected
individuals, sim_RVped
chooses a proband from the set of available candidates. Candidates for proband selection must have the following qualities:
experienced disease onset between the years specified by ascertain_span
,
if less than num_affected
- 1 individuals experienced disease onset prior to the lower bound of ascertain_span
, a proband is chosen from the affected individuals, such that there were at least num_affected
affected individuals when the pedigree was ascertained through the proband.
We allow users to specify the first year that reliable diagnoses can be made using the argument first_diagnosis
. All subjects who experience disease onset prior to this year are not considered when ascertaining the pedigree for a specific number of disease-affected relatives. By default, first_diagnosis = NULL
so that all affected relatives, recalled by the proband, are considered when ascertaining the pedigree.
After the proband is selected, the pedigree is trimmed based on the proband's recall probability of his or her relatives. This option is included to allow researchers to model the possibility that a proband either cannot provide a complete family history or that they explicitly request that certain family members not be contacted. If recall_probs
is missing, the default values of four times the kinship coefficient, as defined by Thompson (see references), between the proband and his or her relatives are assumed. This has the effect of retaining all first degree relatives with probability 1, retaining all second degree relatives with probability 0.5, retaining all third degree relatives with probability 0.25, etc. Alternatively, the user may specify a list of length \(l\), such that the first \(l-1\) items represent the respective recall probabilities for relatives of degree \(1, 2, ... , l-1\) and the \(l^{th}\) item represents the recall probability of a relative of degree \(l\) or greater. For example, if recall_probs = c(1, 0.75, 0.5)
, then all first degree relatives (i.e. parents, siblings, and offspring) are retained with probability 1, all second degree relatives (i.e. grandparents, grandchildren, aunts, uncles, nieces and nephews) are retained with probability 0.75, and all other relatives are retained with probability 0.5. To simulate fully ascertained pedigrees, simply specify recall_probs = c(1)
.
In the event that a trimmed pedigree fails the num_affected
condition, sim_RVped
will discard that pedigree and simulate another until the condition is met. For this reason, the values specified for recall_probs
affect computation time.
Nieuwoudt, Christina and Jones, Samantha J and Brooks-Wilson, Angela and Graham, Jinko. (24 September 2018) Simulating Pedigrees Ascertained for Multiple Disease-Affected Relatives. <doi:10.1101/234153>.
Ken-Ichi Kojima, Therese M. Kelleher. (1962), Survival of Mutant Genes. The American Naturalist 96, 329-346.
Thompson, E. (2000). Statistical Inference from Genetic Data on Pedigrees. NSF-CBMS Regional Conference Series in Probability and Statistics, 6, I-169.
# NOT RUN {
#Read in age-specific hazards
data(AgeSpecific_Hazards)
#Simulate pedigree ascertained for multiple affected individuals
set.seed(2)
ex_RVped <- sim_RVped(hazard_rates = hazard(hazardDF = AgeSpecific_Hazards),
GRR = 20,
RVfounder = TRUE,
FamID = 1,
founder_byears = c(1900, 1905),
ascertain_span = c(1995, 2015),
num_affected = 2,
stop_year = 2017,
recall_probs = c(1, 1, 0))
# Observe: ex_RVped is a list containing two ped objects
summary(ex_RVped)
# The first is the original pedigree prior
# to proband selection and trimming
plot(ex_RVped[[1]])
# The second is the ascertained pedigree which
# has been trimmed based on proband recall
plot(ex_RVped[[2]])
summary(ex_RVped[[2]])
# NOTE: by default, RVfounder = FALSE.
# Under this setting pedigrees segregate a causal
# variant with probability equal to carrier_prob.
# }
Run the code above in your browser using DataLab