lsa (version 0.73.2)

as.textmatrix: Display a latent semantic space generated by Latent Semantic Analysis (LSA)

Description

Returns a latent semantic space (created by createLSAspace) in textmatrix format: rows are terms, columns are documents.

Usage

as.textmatrix( LSAspace )

Arguments

LSAspace

a latent semantic space generated by createLSAspace.

Value

textmatrix

a textmatrix representation of the latent semantic space.

Details

To allow comparisons between terms and documents, the internal format of the latent semantic space needs to be converted to a classical document-term matrix (just like the ones generated by textmatrix() that are of class `textmatrix').

Remark: There are other ways to compare documents and terms using the partial matrices from an LSA space directly. See (Berry, 1995) for more information.

References

Berry, M., Dumais, S., and O'Brien, G (1995) Using Linear Algebra for Intelligent Information Retrieval. In: SIAM Review, Vol. 37(4), pp.573--595.

See Also

textmatrix, lsa, fold_in

Examples

Run this code
# NOT RUN {
# create some files
td = tempfile()
dir.create(td)
write( c("dog", "cat", "mouse"), file=paste(td, "D1", sep="/"))
write( c("hamster", "mouse", "sushi"), file=paste(td, "D2", sep="/"))
write( c("dog", "monster", "monster"), file=paste(td, "D3", sep="/"))
write( c("dog", "mouse", "dog"), file=paste(td, "D4", sep="/"))

# read files into a document-term matrix
myMatrix = textmatrix(td, minWordLength=1)

# create the latent semantic space
myLSAspace = lsa(myMatrix, dims=dimcalc_raw()) 

# display it as a textmatrix again
round(as.textmatrix(myLSAspace),2) # should give the original

# create the latent semantic space
myLSAspace = lsa(myMatrix, dims=dimcalc_share()) 

# display it as a textmatrix again
myNewMatrix = as.textmatrix(myLSAspace) 
myNewMatrix # should look be different!

# compare two terms with the cosine measure
cosine(myNewMatrix["dog",], myNewMatrix["cat",])

# compare two documents with pearson
cor(myNewMatrix[,1], myNewMatrix[,2], method="pearson")

# clean up
unlink(td, recursive=TRUE)

# }

Run the code above in your browser using DataLab