A function for writing documents out to a .ldac formated file.
writeLdac(documents, file, zeroindex = TRUE)
A documents object to be written out to a file. Object must be a list of with each element corresponding to a document. Each document is represented as an integer matrix with two rows, and columns equal to the number of unique vocabulary words in the document. The first row contains the 1-indexed vocabulary entry and the second row contains the number of times that term appears
A character string giving the name of the file to be written.
This object is passed directly to the argument con
in
writeLines
and thus can be a connection object as well.
If TRUE
(the default) it subtracts one
from each index. If FALSE
it uses the indices as given. The
standard .ldac
format indexes from 0 as per standard convention in
most languages. Our documents format indexes from 1 to abide by conventions
in R
. This option converts to the zero index by default.
This is a simple convenience function for writing out document corpora.
Files can be read back into R using readCorpus
or simply used
for other programs. The output is a file in the .ldac
sparse matrix
format popularized by Dave Blei's C code for LDA.
# NOT RUN {
# }
# NOT RUN {
#Convert the gadarian data into documents format
temp<-textProcessor(documents=gadarian$open.ended.response,metadata=gadarian)
documents <- temp$documents
#Now write out to an ldac file
writeLdac(documents, file="gadarian.ldac")
# }
Run the code above in your browser using DataLab