Learn R Programming

quanteda (version 0.9.2-0)

dfm: create a document-feature matrix

Usage

dfm(x, ...)

## S3 method for class 'character': dfm(x, verbose = TRUE, toLower = TRUE, removeNumbers = TRUE, removePunct = TRUE, removeSeparators = TRUE, removeTwitter = FALSE, stem = FALSE, ignoredFeatures = NULL, keptFeatures = NULL, matrixType = c("sparse", "dense"), language = "english", thesaurus = NULL, dictionary = NULL, valuetype = c("glob", "regex", "fixed"), dictionary_regex = FALSE, ...)

## S3 method for class 'tokenizedTexts': dfm(x, verbose = TRUE, toLower = TRUE, stem = FALSE, ignoredFeatures = NULL, keptFeatures = NULL, matrixType = c("sparse", "dense"), language = "english", thesaurus = NULL, dictionary = NULL, valuetype = c("glob", "regex", "fixed"), dictionary_regex = FALSE, ...)

## S3 method for class 'corpus': dfm(x, verbose = TRUE, groups = NULL, ...)

is.dfm(x)

as.dfm(x)

Arguments

x
corpus or character vector from which to generate the document-feature matrix
...
additional arguments passed to tokenize, which can include for instance ngrams and concatenator for tokenizing multi-token sequences
verbose
display messages if TRUE
toLower
convert texts to lowercase
removeNumbers
remove numbers, see tokenize
removePunct
remove numbers, see tokenize
removeSeparators
remove separators (whitespace), see tokenize
removeTwitter
if FALSE, preserve #} and \code{@} characters, see tokenize

stem{if TRUE, stem words}

ignoredFeatures{a character vector of user-supplied features to