udpipe (version 0.3)

dtm_remove_lowfreq: Remove terms occurring with low frequency from a Document-Term-Matrix and documents with no terms

Description

Remove terms occurring with low frequency from a Document-Term-Matrix and documents with no terms

Usage

dtm_remove_lowfreq(dtm, minfreq = 5, maxterms)

Arguments

dtm

an object returned by document_term_matrix or an object of class DocumentTermMatrix

minfreq

integer with the minimum number of times the term should occur in order to keep the term

maxterms

integer indicating the maximum number of terms which should be kept in the dtm. The argument is optional.

Value

a sparse Matrix as returned by sparseMatrix or an object of class DocumentTermMatrix where terms with low occurrence are removed and documents without any terms are also removed

Examples

Run this code
# NOT RUN {
data(brussels_reviews_anno)
x <- subset(brussels_reviews_anno, xpos == "NN")
x <- x[, c("doc_id", "lemma")]
x <- document_term_frequencies(x)
dtm <- document_term_matrix(x)


## Remove terms with low frequencies and documents with no terms
x <- dtm_remove_lowfreq(dtm, minfreq = 10)
dim(x)
x <- dtm_remove_lowfreq(dtm, minfreq = 10, maxterms = 25)
dim(x)
# }

Run the code above in your browser using DataLab