Learn R Programming

recosystem (version 0.2.5)

convert: Read Data File and Convert to Binary Format

Description

These methods are member functions of class "RecoSys" that convert training and testing data files into binary format. The conversion is a preprocessing step prior to the model training part, since data with this binary format could be accessed more efficiently.

The common usage of these methods is r = Reco() r$convert_train(rawfile, outdir, verbose = TRUE) r$convert_test(rawfile, outdir, verbose = TRUE)

Arguments

r
Object returned by Reco()
rawfile
Path of data file, see section 'Data format' for details
outdir
Directory in which the output binary file will be generated. If missing, tempdir() will be used.
verbose
Whether to show detailed information. Default is TRUE.

Data format

The data file required by these methods takes the format of sparse matrix in triplet form, i.e., each line in the file contains three numbers row col value representing a number in the rating matrix with its location. In real applications, it typically looks like user_id item_id rating

NOTE: row and col start from 0. So if the first user rates 3 on the first item, the line will be 0 0 3

NOTE: For testing data, the file also needs to contain three numbers each line. If the rating values are unknown, you can put any number as placeholders. Example data files are contained in the recosystem/dat directory.

References

LIBMF: A Matrix-factorization Library for Recommender Systems. http://www.csie.ntu.edu.tw/~cjlin/libmf/

Y. Zhuang, W.-S. Chin, Y.-C. Juan, and C.-J. Lin. A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems. Technical report 2014.

See Also

train, output, predict

Examples

Run this code
trainset = system.file("dat", "smalltrain.txt", package = "recosystem")
testset = system.file("dat", "smalltest.txt", package = "recosystem")
r = Reco()
r$convert_train(trainset)
r$convert_test(testset)
print(r)

Run the code above in your browser using DataLab