Learn R Programming

cmfrec (version 3.5.1-3)

factors: Calculate latent factors on new data

Description

Determine latent factors for new user(s)/row(s), given either `X` data (a.k.a. "warm-start"), or `U` data (a.k.a. "cold-start"), or both.

If passing both types of data (`X` and `U`), and the number of rows in them differs, will be assumed that the shorter matrix has only missing values for the unmatched entries in the other matrix.

Note: this function will not perform any internal re-indexing for the data. If the `X` to which the data was fit was a `data.frame`, the numeration of the items will be under `model$info$item_mapping`. There is also a function factors_single which will let the model do the appropriate reindexing.

For example usage, see the main section fit_models.

Usage

factors(model, ...)

# S3 method for CMF factors( model, X = NULL, U = NULL, U_bin = NULL, weight = NULL, output_bias = FALSE, nthreads = model$info$nthreads, ... )

# S3 method for CMF_implicit factors(model, X = NULL, U = NULL, nthreads = model$info$nthreads, ...)

# S3 method for ContentBased factors(model, U, nthreads = model$info$nthreads, ...)

# S3 method for OMF_explicit factors( model, X = NULL, U = NULL, weight = NULL, output_bias = FALSE, output_A = FALSE, exact = FALSE, nthreads = model$info$nthreads, ... )

# S3 method for OMF_implicit factors( model, X = NULL, U = NULL, output_A = FALSE, nthreads = model$info$nthreads, ... )

Value

If passing `output_bias=FALSE`, `output_A=FALSE`, and for the implicit-feedback models, will return a matrix with the obtained latent factors for each row/user given the `X` and/or `U` data (number of rows is `max(nrow(X), nrow(U), nrow(U_bin))`). If passing any of the above options, will return a list with the following elements:

  • `factors`: The obtained latent factors (a matrix).

  • `bias`: (If passing `output_bias=TRUE`) A vector with the obtained biases for each row/user.

  • `A`: (If passing `output_A=TRUE`) The raw `A` factors matrix (which is added to the factors determined from user attributes in order to obtain the factorization parameters).

Arguments

model

A collective matrix factorization model from this package - see fit_models for details.

...

Not used.

X

New `X` data, with rows denoting new users. Can be passed in the following formats:

  • A sparse COO/triplets matrix, either from package `Matrix` (class `dgTMatrix`), or from package `SparseM` (class `matrix.coo`).

  • A sparse matrix in CSR format, either from package `Matrix` (class `dgRMatrix`), or from package `SparseM` (class `matrix.csr`). Passing the input as CSR is faster than COO as it will be converted internally.

  • A sparse row vector from package `Matrix` (class `dsparseVector`).

  • A dense matrix from base R (class `matrix`), with missing entries set as `NA`/`NaN`.

  • A dense row vector from base R (class `numeric`), with missing entries set as `NA`/`NaN`.

Dense `X` data is not supported for `CMF_implicit` or `OMF_implicit`.

U

New `U` data, with rows denoting new users. Can be passed in the same formats as `X`, or additionally as a `data.frame`, which will be internally converted to a matrix.

U_bin

New binary columns of `U`. Must be passed as a dense matrix from base R or as a `data.frame`.

weight

Associated observation weights for entries in `X`. If passed, must have the same shape as `X` - that is, if `X` is a sparse matrix, should be a numeric vector with length equal to the non-missing elements (or a sparse matrix in the same format, but will not make any checks on the indices), if `X` is a dense matrix, should also be a dense matrix with the same number of rows and columns.

output_bias

Whether to also return the user bias determined by the model given the data in `X`.

nthreads

Number of parallel threads to use.

output_A

Whether to return the raw `A` factors (the free offset).

exact

(In the `OMF_explicit` model) Whether to calculate `A` and `Am` with the regularization applied to `A` instead of to `Am` (if using the L-BFGS method, this is how the model was fit). This is usually a slower procedure. Only relevant when passing `X` data.

Details

Note that, regardless of whether the model was fit with the L-BFGS or ALS method with CG or Cholesky solver, the new factors will be determined through the Cholesky method or through the precomputed matrices (e.g. a simple matrix-matrix multiply for the `ContentBased` model), unless passing `U_bin` in which case they will be determined through the same L-BFGS method with which the model was fit.

See Also

factors_single