makeFabiaDataPos: Generation of Bicluster Data

Description

makeFabiaDataPos: Rimplementation of makeFabiaDataPos.

Usage

makeFabiaDataPos(n,l,p,f1,f2,of1,of2,sd_noise,sd_z_noise,
              mean_z,sd_z,sd_l_noise,mean_l,sd_l)

Arguments

number of observations.

number of samples.

number of biclusters.

nn/f1 max. additional samples are active in a bicluster.

n/f2 max. additional observations that form a pattern in a bicluster.

of1

minimal active samples in a bicluster.

of2

minimal observations that form a pattern in a bicluster.

sd_noise

Gaussian zero mean noise std on data matrix.

sd_z_noise

Gaussian zero mean noise std for deactivated hidden factors.

mean_z

Gaussian mean for activated factors.

sd_z

Gaussian std for activated factors.

sd_l_noise

Gaussian zero mean noise std if no observation patterns are present.

mean_l

Gaussian mean for observation patterns.

sd_l

Gaussian std for observation patterns.

Value

Xthe noise data from $R^{n \times l}$.
Ythe noise free data from $R^{n \times l}$.
ZClist where i-th element gives samples belonging to i-th bicluster.
LClist where i-th element gives observations belonging to i-th bicluster.

concept

biclustering
sparse coding
sparse matrix factorization

Details

Essentially the data generation model is the sum of outer products of sparse vectors: $$X = \sum_{i=1}^{p} \lambda_i z_i^T + U$$ where the number of summands $p$ is the number of biclusters. The matrix factorization is $$X = L Z + U$$ and noise free $$Y = L Z$$

Here $\lambda_i$ are from $R^n$, $z_i$ from $R^l$, $L$ from $R^{n \times p}$, $Z$ from $R^{p \times l}$, and $X$, $U$, $Y$ from $R^{n \times l}$.

Sequentially $L_i$ are generated using n, f2, of2, sd_l_noise, mean_l, sd_l. of2 gives the minimal observations participating in a bicluster to which between 0 and $n/f2$ observations are added, where the number is uniformly chosen. sd_l_noise gives the noise of observations not participating in the bicluster. mean_l and sd_l determines the Gaussian from which the values are drawn for the observations that participate in the bicluster. "POS": The sign of the mean is fixed.

Sequentially $Z_i$ are generated using l, f1, of1, sd_z_noise, mean_z, sd_z. of1 gives the minimal samples participating in a bicluster to which between 0 and $l/f1$ samples are added, where the number is uniformly chosen. sd_z_noise gives the noise of samples not participating in the bicluster. mean_z and sd_z determines the Gaussian from which the values are drawn for the samples that participate in the bicluster.

$U$ is the overall Gaussian zero mean noise generated by sd_noise.

Implementation in R.

Examples

Run this code

#---------------
# TEST
#---------------

dat <- makeFabiaDataPos(n = 100,l= 50,p = 3,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]
Y <- dat[[2]]

matrixImagePlot(Y)
dev.new()
matrixImagePlot(X)


#---------------
# DEMO
#---------------

dat <- makeFabiaDataPos(n = 1000,l= 100,p = 10,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]
Y <- dat[[2]]

matrixImagePlot(Y)
dev.new()
matrixImagePlot(X)

Run the code above in your browser using DataLab