Gaussian noise can be added to the simulated count matrix in multiple ways which can be combined.
add_noise(counts, sd = 100)log_noise(counts, sd = 0.1)
graded_log_noise(counts, sd = 0.1, transform = function(x) x^3)
sqrt_noise(counts, sd = 100)
shift_noise(counts, sd = 0.5, p = 0.5)
A positive integer count matrix with genes in rows and cell subclasses in columns.
An integer count matrix with genes in rows and cell
subclasses typically generated by simulate_bulk().
Standard deviation of noise to be added.
Function for controlling amount of noise by expression level
in graded_log_noise().
Proportion of genes affected by noise.
add_noise adds simple Gaussian noise to counts. This affects low
expressed genes and hardly affects highly expressed genes.
With log_noise,
counts are converted using log2+1 and Gaussian noise added, followed by
conversion back to count scale. This affects all genes irrespective of
expression level.
With graded_log_noise,
counts are converted to log2+1. A scaling factor is calculated for gene
expression level ranging from 0 to 1, which maps to 0 to the maximum number
of counts. This scaling factor is inverted from 1 to 0 (i.e. noise affects
low counts more than high counts) and then passed through the function
specified by transform (this controls how much the middle counts are
affected). Then the Gaussian noise is multiplied by the scaling factor and
added to the counts.
With sqrt_noise, counts
are square root transformed before Gaussian noise is added, and then
transformed back. This still has a stronger effect on low expressed genes,
but the effect is more graduated with a more gradual fall off in effect on
genes with increasing expression.
With shift_noise, whole gene rows are selected at random then each row is
multiplied by a random amount varying according to 2^rnorm. This simulates
shifted expression up/down due to differences in chemistry through which some
genes are more or less detectable.