implements improvements of RESCAL to be able to factorize graaphs with millions of entities. Use parallelization and compact representation of sparse matrices. Modified error calculation criteria that is more efficient and is based on the change of A & R instead of calulating tensor difference.
scRescal(X, rnk, ainit = "nvecs", verbose = 2, Ainit = NULL, Rinit = NULL, lambdaA = 0,
lambdaR = 0, lambdaV = 0, epsilon = 0.001, maxIter = 100, minIter = 1, P = list(),
orthogonalize = FALSE, func_compute_fval = "compute_fit", retAllFact = FALSE,
useQR = FALSE, ncores = 0, OS_WIN = FALSE, savePath = "", oneCluster = TRUE,
useXc = FALSE, saveARinit = FALSE, saveIter = 0, dsname = "", maxNrows = 50000,
generateLog = FALSE)
is a sparse tensor as set of sparse matrices, one for every relation (predicate). (a LIST of SparseMatrix )
The rank of the factorization
the method used to initialize matrix A
the level of messages to be displayed, 0 is minimal.
the initial value of matrix A, can be used to continue factorization from previous results.
the initial value of R (the core tensor, as LIST of frontal slices)
Regularization parameter for A factor matrix. 0 by default
Regularization parameter for R_k factor matrices. 0 by default
Regularization parameter for R_k factor matrices. 0 by default
error threshold
Maximum number of iterations
Minimum number of iterations
Not implemented
Not implemented
function used to compute fit.
flag to return intermediate values of A & R
was suggested by Nickel in Factorizing Yago and implemented by Michail Huffman; found to be converging more slowly.
number of cores used to run in parallel, 0 means no paralellism
compact the sparse tensor: require more space but much faster.
iterations in which A&R to be saved, default 0 :none
option to save init A and R
used as limit to decide if the compact form of the sparse matrix is to be dense matrix (#rows <maxNrows) or to be sparse used in updateA to consider the predicate having rows (in compact form) more than the given number as Big and hence return dense matrix. Default 50000.
True when the operating system is windows, used to allow using Fork when running in parallel
optional path to save intermediate results into it.
Boolean flag indicating that one cluster will be used in different steps when running in parallel
the dataset name
save output when running in parallel to log file in current directory.
list(A=A, R=R, all_err, nitr=itr + 1, times=as.vector(exectimes) Returns a LIST of the following:
The matrix A of the factorization ( n by r)
The core tensor R the factorization as r (rank) matrices of ( r by r)
number of iterations
list of running times of each step.
-Maximilian Nickel, Volker Tresp, Hans-Peter-Kriegel, "A Three-Way Model for Collective Learning on Multi-Relational Data", ICML 2011, Bellevue, WA, USA
-Maximilian Nickel, Volker Tresp, Hans-Peter-Kriegel, "Factorizing YAGO: Scalable Machine Learning for Linked Data" WWW 2012, Lyon, France
-SynthG: mimicking RDF Graphs Using Tensor Factorization, Desouki et al. IEEE ICSC 2021
# NOT RUN {
data('umls_tnsr')
ntnsr=umls_tnsr
tt=scRescal(ntnsr$X,rnk=10,ainit='nvecs',verbose=1,lambdaA=0,epsilon=1e-4,
lambdaR=0,ncores = 2,OS_WIN = TRUE)
# }
Run the code above in your browser using DataLab