makeFabiaDataBlocks: Generation of Bicluster Data with Bicluster Blocks

Description

makeFabiaDataBlocks: Rimplementation of makeFabiaDataBlocks.

Usage

makeFabiaDataBlocks(n,l,p,f1,f2,of1,of2,sd_noise,sd_z_noise,
              mean_z,sd_z,sd_l_noise,mean_l,sd_l)

Arguments

number of observations.

number of samples.

number of biclusters.

nn/f1 max. additional samples are active in a bicluster.

n/f2 max. additional observations that form a pattern in a bicluster.

of1

minimal active samples in a bicluster.

of2

minimal observations that form a pattern in a bicluster.

sd_noise

Gaussian zero mean noise std on data matrix.

sd_z_noise

Gaussian zero mean noise std for deactivated hidden factors.

mean_z

Gaussian mean for activated factors.

sd_z

Gaussian std for activated factors.

sd_l_noise

Gaussian zero mean noise std if no observation patterns are present.

mean_l

Gaussian mean for observation patterns.

sd_l

Gaussian std for observation patterns.

Value

Ythe noise data from $R^{n \times l}$.
Xthe noise free data from $R^{n \times l}$.
ZClist where i-th element gives samples belonging to i-th bicluster.
LClist where i-th element gives observations belonging to i-th bicluster.

concept

biclustering
sparse coding
sparse matrix factorization

Details

Bicluster data is generated for visualization because the biclusters are now in block format. That means observations and samples that belong to a bicluster are consecutive. This allows visual inspection because the use can identify blocks and whether they have been found or reconstructed.

Essentially the data generation model is the sum of outer products of sparse vectors: $$X = \sum_{i=1}^{p} \lambda_i z_i^T + U$$ where the number of summands $p$ is the number of biclusters. The matrix factorization is $$X = L Z + U$$ and noise free $$Y = L Z$$

Here $\lambda_i$ are from $R^n$, $z_i$ from $R^l$, $L$ from $R^{n \times p}$, $Z$ from $R^{p \times l}$, and $X$, $U$, $Y$ from $R^{n \times l}$.

Sequentially $L_i$ are generated using n, f2, of2, sd_l_noise, mean_l, sd_l. of2 gives the minimal observations participating in a bicluster to which between 0 and $n/f2$ observations are added, where the number is uniformly chosen. sd_l_noise gives the noise of observations not participating in the bicluster. mean_l and sd_l determines the Gaussian from which the values are drawn for the observations that participate in the bicluster. The sign of the mean is randomly chosen for each component.

Sequentially $Z_i$ are generated using l, f1, of1, sd_z_noise, mean_z, sd_z. of1 gives the minimal samples participating in a bicluster to which between 0 and $l/f1$ samples are added, where the number is uniformly chosen. sd_z_noise gives the noise of samples not participating in the bicluster. mean_z and sd_z determines the Gaussian from which the values are drawn for the samples that participate in the bicluster.

$U$ is the overall Gaussian zero mean noise generated by sd_noise.

Implementation in R.

Examples

Run this code

#---------------
# TEST
#---------------

dat <- makeFabiaDataBlocks(n = 100,l= 50,p = 3,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]
Y <- dat[[2]]

matrixImagePlot(Y)
dev.new()
matrixImagePlot(X)


#---------------
# DEMO
#---------------

dat <- makeFabiaDataBlocks(n = 1000,l= 100,p = 10,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

Y <- dat[[1]]
X <- dat[[2]]

matrixImagePlot(Y)
dev.new()
matrixImagePlot(X)

Run the code above in your browser using DataLab