detect noise
noise(.Object, ...)# S4 method for DocumentTermMatrix
noise(.Object, minTotal = 2,
minTfIdfMean = 0.005, sparse = 0.995, stopwordsLanguage = "german",
minNchar = 2, specialChars = getOption("polmineR.specialChars"),
numbers = "^[0-9\\.,]+$", verbose = TRUE)
# S4 method for TermDocumentMatrix
noise(.Object, ...)
# S4 method for character
noise(.Object, stopwordsLanguage = "german",
minNchar = 2, specialChars = getOption("polmineR.specialChars"),
numbers = "^[0-9\\.,]+$", verbose = TRUE)
# S4 method for textstat
noise(.Object, p_attribute, ...)
an .Object of class "DocumentTermMatrix"
further parameters
minimum colsum (for DocumentTermMatrix) to qualify a term as non-noise
minimum mean value for tf-idf to qualify a term as non-noise
will be passed into "removeSparseTerms"
from "tm"
-package
e.g. "german", to get stopwords defined in the tm package
min char length ti qualify a term as non-noise
special characters to drop
regex, to drop numbers
logical
relevant if applied to a textstat object
a list