dfm: create a document-feature matrix

Usage

dfm(x, ...)
## S3 method for class 'character':
dfm(x, verbose = TRUE, toLower = TRUE,
  removeNumbers = TRUE, removePunct = TRUE, removeSeparators = TRUE,
  removeTwitter = FALSE, stem = FALSE, ignoredFeatures = NULL,
  keptFeatures = NULL, matrixType = c("sparse", "dense"),
  language = "english", thesaurus = NULL, dictionary = NULL,
  valuetype = c("glob", "regex", "fixed"), dictionary_regex = FALSE, ...)
## S3 method for class 'tokenizedTexts':
dfm(x, verbose = TRUE, toLower = TRUE,
  stem = FALSE, ignoredFeatures = NULL, keptFeatures = NULL,
  matrixType = c("sparse", "dense"), language = "english",
  thesaurus = NULL, dictionary = NULL, valuetype = c("glob", "regex",
  "fixed"), dictionary_regex = FALSE, ...)
## S3 method for class 'corpus':
dfm(x, verbose = TRUE, groups = NULL, ...)
is.dfm(x)
as.dfm(x)

Arguments

corpus or character vector from which to generate the document-feature matrix

...

additional arguments passed to tokenize, which can include for instance ngrams and concatenator for tokenizing multi-token sequences

verbose

display messages if TRUE

toLower

convert texts to lowercase

removeNumbers

remove numbers, see tokenize

removePunct

remove numbers, see tokenize

removeSeparators

remove separators (whitespace), see tokenize

removeTwitter

if FALSE, preserve

#} and \code{@} 
characters, see tokenize

stem{if TRUE, stem words}

ignoredFeatures{a character vector of user-supplied features to