Fit the Latent Semantic Analysis scaling model to a dfm, which may be
weighted (for instance using dfm_tfidf
).
textmodel_lsa(x, nd = 10, margin = c("both", "documents", "features"))
the dfm on which the model will be fit
the number of dimensions to be included in output
margin to be smoothed by the SVD
svds in the RSpectra package is applied to enable the fast computation of the SVD.
Rosario, Barbara. 2000. "Latent Semantic Indexing: An overview". Technical report INFOSYS 240 Spring Paper, University of California, Berkeley.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. 1990. "Indexing by latent semantic analysis". Journal of the American society for information science 41(6), 391.
# NOT RUN {
ie_dfm <- dfm(data_corpus_irishbudget2010)
# create an LSA space and return its truncated representation in the low-rank space
ie_lsa <- textmodel_lsa(ie_dfm[1:10, ])
head(ie_lsa$docs)
# matrix in low_rank LSA space
ie_lsa$matrix_low_rank[,1:5]
# fold queries into the space generated by ie_dfm[1:10,]
# and return its truncated versions of its representation in the new low-rank space
new_lsa <- predict(ie_lsa, ie_dfm[11:14, ])
new_lsa$docs_newspace
# }
Run the code above in your browser using DataLab