SimulateIsoforms() performs a simple simulation of isoform-specific kinetic
parameters to showcase and test EstimateIsoformFractions(). It assumes that
there are a set of reads (fraction of total set by funique parameter) which
map uniquely to a given isoform, while the rest are ambiguous to all isoforms
from that gene. Mutational content of these reads are simulated as in
SimulateOneRep().
SimulateIsoforms(
nfeatures,
nt = NULL,
seqdepth = nfeatures * 2500,
label_time = 4,
sample_name = "sampleA",
feature_prefix = "Gene",
pnew = 0.1,
pold = 0.002,
funique = 0.2,
readlength = 200,
Ucont = 0.25,
avg_numiso = 2,
psynthdiff = 0.5,
logkdeg_mean = -1.9,
logkdeg_sd = 0.7,
logksyn_mean = 2.3,
logksyn_sd = 0.7
)List with two elements:
cB: Tibble that can be passed as the cB arg to EZbakRData().
ground_truth: Tibble containing simulated ground truth.
Number of "features" to simulate data for. Each feature will have a simulated number of transcript isoforms
(Optional), can provide a vector of the number of isoforms you would
like to simulate for each of the nfeatures features. Vector can either be length
1, in which case that many isoforms will be simulated for all features, or length
equal to nfeatures.
Total number of sequencing reads to simulate
Length of s^4^U feed to simulate.
Character vector to assign to sample column of output simulated
data table (the cB table).
Name given to the i-th feature is paste0(feature_prefix, i). Shows up in the
feature column of the output simulated data table.
Probability that a T is mutated to a C if a read is new.
Probability that a T is mutated to a C if a read is old.
Fraction of reads that uniquely "map" to a single isoform.
Length of simulated reads. In this simple simulation, all reads are simulated as being exactly this length.
Percentage of nucleotides simulated to be U's.
Average number of isoforms for each feature. Feature-specific
isoform counts are drawn from a Poisson distribution with this average. NOTE:
to insure that all features have multiple isoforms, the simulated number of
isoforms drawn from a Poisson distribution is incremented by 2. Thus, the
actual average number of isoforms from each feature is avg_numiso + 2.
Percentage of genes for which all isoform abundance differences are synthesis driven. If not synthesis driven, then isoform abundance differences will be driven by differences in isoform kdegs.
meanlog of a log-normal distribution from which kdegs are simulated
sdlog of a log-normal distribution from which kdegs are simulated
meanlog of a log-normal distribution from which ksyns are simulated
sdlog of a log-normal distribution from which ksyns are simulated
simdata <- SimulateIsoforms(30)
Run the code above in your browser using DataLab