trio.sim
generates case-parents trios when the disease
risk of children is specified by (possibly higher-order) SNP-SNP
interactions. The SNP minor allele frequencies and/or haplotypes are
specified by the user, as are the parameters in the logistic model
that describes the disease risk. If pi.usr
is specified, a
specific type of model, namely the well-known Risch model, will be employed.trio.sim(freq, interaction = "1R and 2D", prev = 1e-3, OR = 1, pi.usr = 0, n = 100, rep = 1, step.save = NULL, step.load = NULL, verbose = FALSE)
simuBkMap
contained in this package. If provided, the following argument
blocks
will be ignored.
The object must have three columns in the following order: block
identifiers (key
), haplotypes (hap
), and haplotype
frequencies (freq
). The block identifiers must be unique for
each block. For each block, the haplotypes must be encoded as a
string of the integers 1 and 2, where 1 refers to the major allele
and 2 refers to the minor allele. The respective haplotype
frequencies will be normalized to sum one.
interaction
to be affectedNULL
In that case, the object will not be saved for re-use in
later run. See Details.NULL
. In that case, a new object will be generated.trio.sim
simulates case-parent trio data
when the disease risk of children is specified by (possibly
higher-order) SNP-SNP interactions. The mating tables and the
respective sampling probabilities depend on the haplotype frequencies
(or SNP minor allele frequencies when the SNP does not belong to a
block). This information is specified in the freq
argument of
the function. The probability of disease is assumed to be described
by the logistic term logit(p) = a + b I[Interaction],
where a = logit (prev
) and b = log(OR
),
with prev
and OR
specified by the user. Note that at
this point only data for two risk groups (carriers versus
non-carriers) can be simulated. Since the computational demands for
generating the mating is dependent on the number of loci involved in
the interactions and the lengths of the LD blocks that contain these
disease loci, the interaction term can only consist of up to six loci,
not more than one of those loci per block, and haplotype (block)
lengths of at most 5 loci.
Generating the mating tables and the respective sampling probabilities
necessary to simulate case-parent trios can be very time consuming for
interaction models involving three or more SNPs. In simulation
studies, many replicates of similar data are usually required, and
generating these sampling probabilities in each instance would be a
large and avoidable computational burden (CPU and memory). The
sampling probabilities depend foremost on the interaction term and the
underlying haplotype frequencies, and as long as these remain constant
in the simulation study, the mating table information and the sampling
probabilities can be "recycled". This is done by storing the relevant
information (denoted as "step-stone") as a binary R file in the
working directory (using the argument step.save
), and loading
the binary file again in future simulations (using the argument
step.load
), speeding up the simulation process dramatically.
It is even possible to change the parameters prev
and OR
(corresponding to a and b in the logistic model) in
these additional simulations, as the sampling probabilities can be
adjusted accordingly.
trio.prepare
data(trio.data)
sim <- trio.sim(freq=simuBkMap, interaction="1R and 5R", prev=.001, OR=2, n=20, rep=1)
sim[[1]][1:6, 1:12]
Run the code above in your browser using DataLab