predict: Clustering and Prediction

Description

The methods predict for NMF models return the cluster membership of each sample or each feature. Currently the classification/prediction of new data is not implemented.

Usage

predict(object, ...)
  ## S3 method for class 'NMF':
predict(object,
    what = c("columns", "rows", "samples", "features"),
    prob = FALSE, dmatrix = FALSE)
  ## S3 method for class 'NMFfitX':
predict(object,
    what = c("columns", "rows", "samples", "features", "consensus", "chc"),
    dmatrix = FALSE, ...)

Arguments

object

an NMF model

what

a character string that indicates the type of cluster membership should be returned: columns or rows for clustering the colmuns or the rows of the target matrix respectively. The values samples an

prob

logical that indicates if the relative contributions of/to the dominant basis component should be computed and returned. See Details.

dmatrix

logical that indicates if a dissimiliarity matrix should be attached to the result. This is notably used internally when computing NMF clustering silhouettes.

...

additional arguments affecting the predictions produced.

Details

The cluster membership is computed as the index of the dominant basis component for each sample (what='samples' or 'columns') or each feature (what='features' or 'rows'), based on their corresponding entries in the coefficient matrix or basis matrix respectively.

For example, if what='samples', then the dominant basis component is computed for each column of the coefficient matrix as the row index of the maximum within the column.

If argument prob=FALSE (default), the result is a factor. Otherwise a list with two elements is returned: element predict contains the cluster membership index (as a factor) and element prob contains the relative contribution of the dominant component to each sample (resp. the relative contribution of each feature to the dominant basis component):

Samples:$$p_j = x_{k_0} / \sum_k x_k$$, for each sample$1\leq j \leq p$, where$x_k$is the contribution of the$k$-th basis component to$j$-th sample (i.e.H[k ,j]), and$x_{k_0}$is the maximum of these contributions.
Features:$$p_i = y_{k_0} / \sum_k y_k$$, for each feature$1\leq i \leq p$, where$y_k$is the contribution of the$k$-th basis component to$i$-th feature (i.e.W[i, k]), and$y_{k_0}$is the maximum of these contributions.

References

Brunet J, Tamayo P, Golub TR and Mesirov JP (2004). "Metagenes and molecular pattern discovery using matrix factorization." _Proceedings of the National Academy of Sciences of the United States of America_, *101*(12), pp. 4164-9. ISSN 0027-8424, , .

Pascual-Montano A, Carazo JM, Kochi K, Lehmann D and Pascual-marqui RD (2006). "Nonsmooth nonnegative matrix factorization (nsNMF)." _IEEE Trans. Pattern Anal. Mach. Intell_, *28*, pp. 403-415.

Examples

Run this code

# roxygen generated flag
options(R_CHECK_RUNNING_EXAMPLES_=TRUE)

# random target matrix
v <- rmatrix(20, 10)
# fit an NMF model
x <- nmf(v, 5)

# predicted column and row clusters
predict(x)
predict(x, 'rows')

# with relative contributions of each basis component
predict(x, prob=TRUE)
predict(x, 'rows', prob=TRUE)

Run the code above in your browser using DataLab