nmf: Non-negative Matrix Factorization (NMF) on GPU

Description

Computes the non-negative matrix factorization of a data matrix X using the factorization parameter r. Multiple algorithms and initialization methods are implemented in the nmfgpu library using CUDA hardware acceleration. Depending on the available hardware, these algorithms should outperform traditional CPU implementations.

Usage

nmf(...)

## S3 method for class 'default':
nmf(data, r, algorithm = "mu",
  initMethod = "AllRandomValues", seed = floor(runif(1, 0,
  .Machine$integer.max)), threshold = 0.1, maxiter = 2000, runs = 1,
  parameters = NULL, useSinglePrecision = F, verbose = T, ssnmf = F,
  ...)

## S3 method for class 'formula':
nmf(formula, data, ...)

## S3 method for class 'nmfgpu':
fitted(object, ...)

## S3 method for class 'nmfgpu':
predict(object, newdata, ...)

Arguments

...

Other arguments

data

Data matrix of dimension n x m with n attributes and m observations. Please note that this differs from most other data mining/machine learning algorithms!

Factorization parameter, which affects the quality of the approximation and runtime.

algorithm

Choosing the right algorithm depends on the data structure. Currently the following algorithms are implemented in the nmfgpu library:

mu: Multiplicative update rules presented by Lee and Seung [2] use a purely multiplicative up

Value

If the factorization process was successful, then a list of the following values will be returned otherwise NULL: ll{ W Factorized matrix W with n attributes and r basis features of the data matrix. H Factorized matrix H with r mixing vectors for m data entries in the data matrix. Frobenius Contains the frobenius norm of the factorization at the end of algorithm execution. RMSD Contains the root-mean-square deviation (RMSD) of the factorization at the end of algorithm execution. ElapsedTime Contains the elapsed time for initialization and algorithm execution. NumIterations Number of iterations until the algorithm had converged. }

item

initMethod
AllRandomValues: Initializes the factorization matrices W and H with uniformly distributed values between 0.0 and 1.0, where 0.0 is excluded and 1.0 is included.
MeanColumns: Initializes the factorization matrix W by computing the mean of five random data matrix columns. The matrix H will be initialized as it would when using AllRandomValues.
k-Means/Random: Initializes the factorization matrix W by computing the k-Means cluster centers of the data matrix. The matrix H will be initialized as it would when using AllRandomValues. This method was presented by Gong et al [5] as initialization strategy H2.
k-Means/NonNegativeWTV: Initializes the factorization matrix W by computing the k-Means cluster centers of the data matrix. The matrix H will be initialized with the product t(W) %*% V, but all negative values are clamped to zero. This method was presented by Gong et al [5] as initialization strategy H4.
EIn-NMF: Initializes the factorization matrix W by computing the k-Means cluster centers of the data matrix. The matrix H will be initialized with a prefix sum equation to build weighted encoding vectors. This method was presented by Gong et al [5] as initialization strategy H5.
seed
threshold
maxiter
runs
parameters
useSinglePrecision
verbose
ssnmf
formula
object
newdata

code

nmfgpu

itemize

CopyExisting: Initializes the factorization matricesWandHwith existing values, which requiresWandHto be set in theparametersargument. On the one hand this enables the user to chain different algorithms, for example using a fast converging algorithm for a base approximation and and a slow algorithm with better convergence properties to finish the optimization process. On the other hand the user can supply matrix intializations, which are not supported by this interface.Note: BothWandHmust have the same dimension as they would have from the passed argumentsXandr.

strong

First convergence criterion:
Second convergence criterion:

References

P. Paatero and U. Tapper, "Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values", Environmetrics, vol. 5, no. 2, pp. 111-126, 1994.

D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization", in Advances in Neural Information Processing Systems 13 (T. Leen, T. Dietterich, and V. Tresp, eds.), pp. 556-562, MIT Press, 2001. V. P. Pauca, J. Piper, and R. J. Plemmons, "Nonnegative matrix factorization for spectral data analysis", Linear Algebra and its Applications, vol. 416, no. 1, pp. 29-47, 2006. Special Issue devoted to the Haifa 2005 conference on matrix theory. A. N. Langville, C. D. Meyer, R. Albright, J. Cox, and D. Duling, "Algorithms, initializations, and convergence for the nonnegative matrix factorization", CoRR, vol. abs/1407.7299, 2014. L. Gong and A. Nandi, "An enhanced initialization method for non-negative matrix factorization", in 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1-6, Sept 2013. A. Pascual-Montano, J. M. Carazo, K. Kochi, D. Lehmann and R. D. Pascual-Marqui "Nonsmooth nonnegative matrix factorization (nsNMF)", in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, pp. 403-415, 2006

Examples

Run this code

# Initialize the library
library(nmfgpu4R)
nmfgpu4R.init()

# Create dummy data
data <- runif(256*1024)
dim(data) <- c(256, 1024)

# Compute several factorization models
result <- nmf(data, 128, algorithm="mu", initMethod="K-Means/Random", maxiter=500)
result <- nmf(data, 128, algorithm="mu", initMethod="CopyExisting", 
                 parameters=list(W=result$W, H=result$H), maxiter=500)
result <- nmf(data, 128, algorithm="gdcls", maxiter=500, parameters=list(lambda=0.1))
result <- nmf(data, 128, algorithm="als", maxiter=500)
result <- nmf(data, 128, algorithm="acls", maxiter=500, 
                 parameters=list(lambdaH=0.1, lambdaW=0.1))
result <- nmf(data, 128, algorithm="ahcls", maxiter=500, 
                 parameters=list(lambdaH=0.1, lambdaW=0.1, alphaH=0.5, alphaW=0.5))
result <- nmf(data, 128, algorithm="nsnmf", maxiter=500, parameters=list(theta=0.25))

# Compute encoding matrices for training and test data
set.seed(42)
idx <- sample(1:nrow(iris), 100, replace=F)
data.train <- iris[idx,]
data.test <- iris[-idx,]

model.nmf <- nmf(t(data.train[,-5]), 2)
encoding.train <- t(predict(model.nmf))
encoding.test <- t(predict(model.nmf, t(data.test[,-5])))

plot(encoding.train, col=data.train[,5], pch=1)
points(encoding.test, col=data.test[,5], pch=4)

Run the code above in your browser using DataLab