Learn R Programming

scrime (version 1.2.9)

simulateSNPs: Simulation of SNP data

Description

Simulates SNP data, where a specified proportion of cases and controls is explained by specified set of SNP interactions. Can also be used to simulate a data set with a multi-categorical response, i.e. a data set in which the cases are divided into several classes (e.g., different diseases or subtypes of a disease).

Usage

simulateSNPs(n.obs, n.snp, vec.ia, prop.explain = 1, 
  list.ia.val = NULL, vec.ia.num = NULL, vec.cat = NULL,
  maf = c(0.1, 0.4), prob.val = rep(1/3, 3), list.equal = NULL, 
  prob.equal = 0.8, rm.redundancy = TRUE, shuffle = FALSE, 
  shuffle.obs = FALSE, rand = NA)

Arguments

n.obs
either an integer specifying the total number of observations, or a vector of length 2 specifying the number of cases and the number of controls. If vec.cat is specified, then the partitioning of the number of cases to the di
n.snp
integer specifying the number of SNPs.
vec.ia
a vector of integers specifying the orders of the interactions that explain the cases. c(3,1,2,3), e.g., means that a three-way, a one-way (i.e. just a SNP), a two-way, and a three-way interaction explain the cases.
prop.explain
either an integer or a vector of length(vec.ia) specifying the proportions of cases explained by the interactions of interest among all observation having the interaction of interest. Must be larger than 0.5. E.g., prop.
list.ia.val
a list of length(vec.ia) specifying the exact interactions. The objects in this list must be vectors of length vec.ia[i], and consist of the values 0 (for homozygous reference), 1 (heterozygous variant), or 2 (homozygou
vec.ia.num
a vector of length(vec.ia) specifying the number of cases (not observations) explained by the interactions in vec.ia. If NULL, all the cases are divided into length(vec.ia) groups of
vec.cat
a vector of the same length of vec.ia specifying the subclasses of the cases that are explained by the corresponding interaction in vec.ia. If NULL, no subclasses will be considered. This feature is current
maf
either an integer, or a vector of length 2 or n.snp specifying the minor allele frequencies. If an integer, all SNPs will have the same minor allele frequency. If a vector of length n.snp, each SNP will have the minor
prob.val
a vector consisting of the probabilities for drawing a 0, 1, or 2, if list.ia.val = NULL, i.e. if the genotypes of the SNPs explaining the case-control status should be randomly drawn. Ignored if list.ia.val is specifie
list.equal
list of same structure as list.ia.val containing only ones and zeros, where a 1 specifies the equality to the corresponding value in list.ia.val, and a 0 specifies the non-equality. Thus, the entries of list.equal
prob.equal
a numeric value specifying the probability that a 1 is drawn when generating list.equal. prob.equal is thus the probability for an equal sign.
rm.redundancy
should redundant SNPs be removed from the explaining interactions? It is possible that one specify an explaining $i$-way interaction, but an interaction between $(i-1)$ of the variables contained in the $i$-way interaction already explains
shuffle
logical. By default, the first sum(vec.ia) columns of the generated data set contain the explanatory SNPs in the same order as they appear in this data set. If TRUE, this order will be shuffled.
shuffle.obs
should the observations be shuffled?
rand
integer. Sets the random number generator in a reproducible state.

Value

  • An object of class simulatedSNPs composed of
  • dataa matrix with n.obs rows and n.snp columns containing the SNP data.
  • cla vector of length n.obs comprising the case-control status of the observations.
  • tab.explaina table naming the explanatory interactions and the numbers of cases and controls explained by them.
  • iacharacter vector naming the interactions.
  • mafvector of length n.snp containing the minor allele frequencies.

See Also

simulateSNPglm, simulateSNPcatResponse

Examples

Run this code
# Simulate a data set containing 2000 observations (1000 cases
# and 1000 controls) and 50 SNPs, where one three-way and two 
# two-way interactions are chosen randomly to be explanatory 
# for the case-control status.

sim1 <- simulateSNPs(2000, 50, c(3, 2, 2))
sim1

# Simulate data of 1200 cases and 800 controls for 50 SNPs, 
# where 90% of the observations showing a randomly chosen 
# three-way interaction are cases, and 95% of the observations 
# showing a randomly chosen two-way interactions are cases.

sim2 <- simulateSNPs(c(1200, 800), 50, c(3, 2), 
   prop.explain = c(0.9, 0.95))
sim2

# Simulate a data set consisting of 1000 observations and 50 SNPs,
# where the minor allele frequency of each SNP is 0.25, and
# the interactions 
# ((SNP1 == 2) & (SNP2 != 0) & (SNP3 == 1))   and 
# ((SNP4 == 0) & (SNP5 != 2))
# are explanatory for 200 and 250 of the 500 cases, respectively,
# and for none of the 500 controls.

list1 <- list(c(2, 0, 1), c(0, 2))
list2 <- list(c(1, 0, 1), c(1, 0))
sim3 <- simulateSNPs(1000, 50, c(3, 2), list.ia.val = list1,
    list.equal = list2, vec.ia.num = c(200, 250), maf = 0.25)

Run the code above in your browser using DataLab