quanteda.textmodels (version 0.9.7)

textmodel_lsa: Latent Semantic Analysis

Description

Fit the Latent Semantic Analysis scaling model to a dfm, which may be weighted (for instance using quanteda::dfm_tfidf()).

Usage

textmodel_lsa(x, nd = 10, margin = c("both", "documents", "features"))

Value

a textmodel_lsa class object, a list containing:

  • sk a numeric vector containing the d values from the SVD

  • docs document coordinates from the SVD (u)

  • features feature coordinates from the SVD (v)

  • matrix_low_rank the multiplication of udv'

  • data the input data as a CSparseMatrix from the Matrix package

Arguments

x

the dfm on which the model will be fit

nd

the number of dimensions to be included in output

margin

margin to be smoothed by the SVD

Author

Haiyan Wang and Kohei Watanabe

Details

svds in the RSpectra package is applied to enable the fast computation of the SVD.

References

Rosario, B. (2000). Latent Semantic Indexing: An Overview. Technical report INFOSYS 240 Spring Paper, University of California, Berkeley.

Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., & Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6): 391.

See Also

predict.textmodel_lsa(), coef.textmodel_lsa()

Examples

Run this code
library("quanteda")
dfmat <- dfm(tokens(data_corpus_irishbudget2010))
# create an LSA space and return its truncated representation in the low-rank space
tmod <- textmodel_lsa(dfmat[1:10, ])
head(tmod$docs)

# matrix in low_rank LSA space
tmod$matrix_low_rank[,1:5]

# fold queries into the space generated by dfmat[1:10,]
# and return its truncated versions of its representation in the new low-rank space
pred <- predict(tmod, newdata = dfmat[11:14, ])
pred$docs_newspace

Run the code above in your browser using DataLab