Learn R Programming

cellGeometry (version 0.5.7)

add_noise: Add noise to count data

Description

Gaussian noise can be added to the simulated count matrix in multiple ways which can be combined.

Usage

add_noise(counts, sd = 100)

log_noise(counts, sd = 0.1)

graded_log_noise(counts, sd = 0.1, transform = function(x) x^3)

sqrt_noise(counts, sd = 100)

shift_noise(counts, sd = 0.5, p = 0.5)

Value

A positive integer count matrix with genes in rows and cell subclasses in columns.

Arguments

counts

An integer count matrix with genes in rows and cell subclasses typically generated by simulate_bulk().

sd

Standard deviation of noise to be added.

transform

Function for controlling amount of noise by expression level in graded_log_noise().

p

Proportion of genes affected by noise.

Details

  • add_noise adds simple Gaussian noise to counts. This affects low expressed genes and hardly affects highly expressed genes.

  • With log_noise, counts are converted using log2+1 and Gaussian noise added, followed by conversion back to count scale. This affects all genes irrespective of expression level.

  • With graded_log_noise, counts are converted to log2+1. A scaling factor is calculated for gene expression level ranging from 0 to 1, which maps to 0 to the maximum number of counts. This scaling factor is inverted from 1 to 0 (i.e. noise affects low counts more than high counts) and then passed through the function specified by transform (this controls how much the middle counts are affected). Then the Gaussian noise is multiplied by the scaling factor and added to the counts.

  • With sqrt_noise, counts are square root transformed before Gaussian noise is added, and then transformed back. This still has a stronger effect on low expressed genes, but the effect is more graduated with a more gradual fall off in effect on genes with increasing expression.

  • With shift_noise, whole gene rows are selected at random then each row is multiplied by a random amount varying according to 2^rnorm. This simulates shifted expression up/down due to differences in chemistry through which some genes are more or less detectable.