Remove terms from a Document-Term-Matrix and keep only documents which have a least some terms
dtm_remove_terms(dtm, terms)
an object returned by document_term_matrix
or an object of class DocumentTermMatrix
a character vector of terms which are in colnames(dtm)
and which should be removed
a sparse Matrix as returned by sparseMatrix
or an object of class DocumentTermMatrix
where the indicated terms are removed as well as documents with no terms whatsoever
# NOT RUN {
data(brussels_reviews_anno)
x <- subset(brussels_reviews_anno, xpos == "NN")
x <- x[, c("doc_id", "lemma")]
x <- document_term_frequencies(x)
dtm <- document_term_matrix(x)
dim(dtm)
dtm <- dtm_remove_terms(dtm, terms = c("appartement", "casa", "centrum", "ciudad"))
dim(dtm)
# }
Run the code above in your browser using DataLab