runCorpusCa(corpus, dtm = NULL, variables = NULL, sparsity = 0.9, ...)
DocumentTermMatrix
will be called on corpus
to create it.removeSparseTerms
from tm.ca
.ca
object as returned by the ca
function.runCorpusCa
runs a correspondence analysis (CA) on the
document-term matrix that can be extracted from a DocumentTermMatrix
function, or directly from the dtm
object if present. If no variable is passed via the variables
argument, a CA is run on the
full document-term matrix (possibly skipping sparse terms, see below). If one or more
variables are chosen, the CA will be based on a stacked table whose rows correspond to
the levels of the variables: each cell contains the sum of occurrences of a given term in
all the documents of the level. Documents that contain a NA
are skipped for this
variable, but taken into account for the others, if any.
In all cases, variables that have not been selected are added as supplementary rows. If at least one variable is passed, documents are also supplementary rows, while they are active otherwise.
The sparsity
argument is passed to removeSparseTerms
to remove less significant terms from the document-term matrix. This is
especially useful for big corpora, which matrices can grow very large, prompting
ca
to take up too much memory.
ca
, meta
, removeSparseTerms
,
DocumentTermMatrix