nmfsc: Non-negative Sparse Matrix Factorization

Description

nmfsc: Rimplementation of nmfsc.

Usage

nmfsc(X,p=5,cyc=100,sL=0.6,sZ=0.6)

Arguments

the data matrix.

number of hidden factors = number of biclusters; default = 5.

cyc

maximal number of iterations; default = 100.

sparseness loadings; default = 0.6.

sparseness factors; default = 0.6.

Value

object of the class Factorization. Containing LZ (estimated noise free data $L Z$), L (loadings $L$), Z (factors $Z$), U (noise $X-LZ$), X (data $X$).

concept

sparse coding
non-negative matrix factorization

Details

Non-negative Matrix Factorization represents positive matrix $X$ by positive matrices $L$ and $Z$ that are sparse.

Objective for reconstruction is Euclidean distance and sparseness constraints.

Essentially the model is the sum of outer products of vectors: $$X = \sum_{i=1}^{p} \lambda_i z_i^T$$ where the number of summands $p$ is the number of biclusters. The matrix factorization is $$X = L Z$$

Here $\lambda_i$ are from $R^n$, $z_i$ from $R^l$, $L$ from $R^{n \times p}$, $Z$ from $R^{p \times l}$, and $X$ from $R^{n \times l}$.

If the nonzero components of the sparse vectors are grouped together then the outer product results in a matrix with a nonzero block and zeros elsewhere.

The model selection is performed by a constraint optimization according to Hoyer, 2004. The Euclidean distance (the Frobenius norm) is minimized subject to sparseness and non-negativity constraints.

Model selection is done by gradient descent on the Euclidean objective and thereafter projection of single vectors of $L$ and single vectors of $Z$ to fulfill the sparseness and non-negativity constraints.

The projection minimize the Euclidean distance to the original vector given an $l_1$-norm and an $l_2$-norm and enforcing non-negativity.

The projection is a convex quadratic problem which is solved iteratively where at each iteration at least one component is set to zero. Instead of the $l_1$-norm a sparseness measurement is used which relates the $l_1$-norm to the $l_2$-norm.

The code is implemented in R.

References

Patrik O. Hoyer, Non-negative Matrix Factorization with Sparseness Constraints, Journal of Machine Learning Research 5:1457-1469, 2004.

D. D. Lee and H. S. Seung, Algorithms for non-negative matrix factorization, In Advances in Neural Information Processing Systems 13, 556-562, 2001.

Examples

Run this code

#---------------
# TEST
#---------------

dat <- makeFabiaDataBlocks(n = 100,l= 50,p = 3,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]
Y <- dat[[2]]
X <- abs(X)


resEx <- nmfsc(X,3,30,0.6,0.6)


#---------------
# DEMO
#---------------

dat <- makeFabiaDataBlocks(n = 1000,l= 100,p = 10,f1 = 5,f2 = 5,
  of1 = 5,of2 = 10,sd_noise = 3.0,sd_z_noise = 0.2,mean_z = 2.0,
  sd_z = 1.0,sd_l_noise = 0.2,mean_l = 3.0,sd_l = 1.0)

X <- dat[[1]]
Y <- dat[[2]]
X <- abs(X)


resToy <- nmfsc(X,13,100,0.6,0.6)

extractPlot(resToy,ti="NMFSC",Y=Y)

Run the code above in your browser using DataLab