train: Training a Recommender Model

Description

This method is a member function of class "RecoSys" that trains a recommender model. It will read from a training data source and create a model file at the specified location. The model file contains necessary information for prediction.

The common usage of this method is

r = Reco()
r$train(train_data, out_model = file.path(tempdir(), "model.txt"),
        opts = list())

Arguments

Object returned by Reco().

train_data

An object of class "DataSource" that describes the source of training data, typically returned by function data_file() or data_memory().

out_model

Path to the model file that will be created.

opts

A number of parameters and options for the model training. See section Parameters and Options for details.

Parameters and Options

The opts argument is a list that can supply any of the following parameters:

loss: Character string, the loss function. Default is "l2", see below for details.
dim: Integer, the number of latent factors. Default is 10.
costp_l1: Numeric, L1 regularization parameter for user factors. Default is 0.
costp_l2: Numeric, L2 regularization parameter for user factors. Default is 0.1.
costq_l1: Numeric, L1 regularization parameter for item factors. Default is 0.
costq_l2: Numeric, L2 regularization parameter for item factors. Default is 0.1.
lrate: Numeric, the learning rate, which can be thought of as the step size in gradient descent. Default is 0.1.
niter: Integer, the number of iterations. Default is 20.
nthread: Integer, the number of threads for parallel computing. Default is 1.
nbin: Integer, the number of bins. Must be greater than nthread. Default is 20.
nmf: Logical, whether to perform non-negative matrix factorization. Default is FALSE.
verbose: Logical, whether to show detailed information. Default is TRUE.

The loss option may take the following values:

For real-valued matrix factorization,

"l2": Squared error (L2-norm)
"l1": Absolute error (L1-norm)
"kl": Generalized KL-divergence

For binary matrix factorization,

"log": Logarithmic error
"squared_hinge": Squared hinge loss
"hinge": Hinge loss

For one-class matrix factorization,

"row_log": Row-oriented pair-wise logarithmic loss
"col_log": Column-oriented pair-wise logarithmic loss

References

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems. ACM TIST, 2015.

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization. PAKDD, 2015.

W.-S. Chin, B.-W. Yuan, M.-Y. Yang, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems. Technical report, 2015.

Examples

Run this code

# NOT RUN {
## Training model from a data file
train_set = system.file("dat", "smalltrain.txt", package = "recosystem")
r = Reco()
set.seed(123) # This is a randomized algorithm
r$train(data_file(train_set),
        opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1)
)

## Training model from data in memory
train_df = read.table(train_set, sep = " ", header = FALSE)
set.seed(123)
r$train(data_memory(train_df[, 1], train_df[, 2], train_df[, 3]),
        opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1)
)

# }

Run the code above in your browser using DataLab

Description

Arguments

Parameters and Options

References

See Also

Examples