Learn R Programming

recosystem (version 0.5.1)

train: Training a Recommender Model

Description

This method is a member function of class "RecoSys" that trains a recommender model. It will read from a training data source and create a model file at the specified location. The model file contains necessary information for prediction.

The common usage of this method is

r = Reco()
r$train(train_data, out_model = file.path(tempdir(), "model.txt"),
        opts = list())

Arguments

r

Object returned by Reco().

train_data

An object of class "DataSource" that describes the source of training data, typically returned by function data_file(), data_memory(), or data_matrix().

out_model

Path to the model file that will be created. If passing NULL, the model will be stored in-memory, and model matrices can then be accessed under r$model$matrices.

opts

A number of parameters and options for the model training. See section Parameters and Options for details.

Parameters and Options

The opts argument is a list that can supply any of the following parameters:

loss

Character string, the loss function. Default is "l2", see below for details.

dim

Integer, the number of latent factors. Default is 10.

costp_l1

Numeric, L1 regularization parameter for user factors. Default is 0.

costp_l2

Numeric, L2 regularization parameter for user factors. Default is 0.1.

costq_l1

Numeric, L1 regularization parameter for item factors. Default is 0.

costq_l2

Numeric, L2 regularization parameter for item factors. Default is 0.1.

lrate

Numeric, the learning rate, which can be thought of as the step size in gradient descent. Default is 0.1.

niter

Integer, the number of iterations. Default is 20.

nthread

Integer, the number of threads for parallel computing. Default is 1.

nbin

Integer, the number of bins. Must be greater than nthread. Default is 20.

nmf

Logical, whether to perform non-negative matrix factorization. Default is FALSE.

verbose

Logical, whether to show detailed information. Default is TRUE.

The loss option may take the following values:

For real-valued matrix factorization,

"l2"

Squared error (L2-norm)

"l1"

Absolute error (L1-norm)

"kl"

Generalized KL-divergence

For binary matrix factorization,

"log"

Logarithmic error

"squared_hinge"

Squared hinge loss

"hinge"

Hinge loss

For one-class matrix factorization,

"row_log"

Row-oriented pair-wise logarithmic loss

"col_log"

Column-oriented pair-wise logarithmic loss

Author

Yixuan Qiu <https://statr.me>

References

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems. ACM TIST, 2015.

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization. PAKDD, 2015.

W.-S. Chin, B.-W. Yuan, M.-Y. Yang, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems. Technical report, 2015.

See Also

$tune(), $output(), $predict()

Examples

Run this code
## Training model from a data file
train_set = system.file("dat", "smalltrain.txt", package = "recosystem")
train_data = data_file(train_set)
r = Reco()
set.seed(123) # This is a randomized algorithm
# The model will be saved to a file
r$train(train_data, out_model = file.path(tempdir(), "model.txt"),
        opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1)
)

## Training model from data in memory
train_df = read.table(train_set, sep = " ", header = FALSE)
train_data = data_memory(train_df[, 1], train_df[, 2], rating = train_df[, 3])
set.seed(123)
# The model will be stored in memory
r$train(train_data, out_model = NULL,
        opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1)
)

## Training model from data in a sparse matrix
if(require(Matrix))
{
    mat = Matrix::sparseMatrix(i = train_df[, 1], j = train_df[, 2], x = train_df[, 3],
                               repr = "T", index1 = FALSE)
    train_data = data_matrix(mat)
    r$train(train_data, out_model = NULL,
            opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1))
}

Run the code above in your browser using DataLab