RDRToolbox (version 1.18.0)

generateData: Simulator for gene expression data

Description

A simulator for gene expression data, whose values are normally distributed values with zero mean. The covariances are given by a configurable block-diagonal matrix. By default, half of the samples contain differential gene expression values (see parameter diffsamples).

Usage

generateData(samples=50, genes=10000, diffgenes=200, blocksize=50, cov1=0.2, cov2=0, diff=0.6, diffsamples)

Arguments

samples
number of samples
genes
number of gene expression values per sample
diffgenes
number of differential genes for class 1
blocksize
size of each block in the blockdiagonal correlation matrix
cov1
covariance within the blocks in the correlation matrix
cov2
covariance between the blocks in the correlation matrix
diff
difference between the random gene expression values and the differential gene expression values
diffsamples
number of samples containing differential gene expression values compared to the rest (if missing, this parameter is set to half of the total number of samples)

Value

'generateData' returns a list containing:
data
a (samples x features)-matrix with the simulated gene expression values
labels
a vector with labels (1,-1) for the two classes

Details

The simulator generates two labeled classes: label 1: samples with differentially expressed genes. label -1: samples without differentially expressed genes.

Examples

Run this code
## generate a dataset with 20 samples and 1.000 gene expression values
d = generateData(samples=20, genes=1000, diffgenes=100, blocksize=10)
data = d[[1]]
labels = d[[2]]

Run the code above in your browser using DataCamp Workspace