Usage
make.sim.data.sd(N, param, samples = c(5, 5),
ndeg = rep(round(0.1*N), 2), fc.basis = 1.5,
libsize.range = c(0.7, 1.4), libsize.mag = 1e+7,
model.org = NULL, sim.length.bias = FALSE,
seed = NULL)
Arguments
N
the number of genes to produce.
param
a named list with negative binomial
parameter sets to sample from. The first member is
the mean parameter to sample from (mu.hat
)
and the second the dispersion (phi.hat
).
This list can be created with the
estimate.sim.params
function. samples
a vector with 2 integers,
which are the number of samples for each
condition (two conditions currently supported).
ndeg
a vector with 2 integers, which are
the number of differentially expressed genes to
be produced. The first element is the number of
up-regulated genes while the second is the
number of down-regulated genes.
fc.basis
the minimum fold-change for
deregulation.
libsize.range
a vector with 2 numbers
(generally small, see the default), as they
are multiplied with libsize.mag
. These
numbers control the library sized of the
synthetic data to be produced.
libsize.mag
a (big) number to multiply
the libsize.range
to produce library
sizes.
model.org
the organism from which the
real data are derived from. It must be one
of the supported organisms (see the main
metaseqr
help page). It is used
to sample real values for GC content. sim.length.bias
a boolean to instruct
the simulator to create genes whose read counts is
proportional to their length. This is achieved by
sorting in increasing order the mean parameter of
the negative binomial distribution (and the
dispersion according to the mean) which will cause
an increasing gene count length with the sampling.
The sampled lengths are also sorted so that in the
final gene list, shorter genes have less counts as
compared to the longer ones. The default is FALSE.
seed
a seed to use with random number
generation for reproducibility.