quanteda (version 4.0.1)

dfm: Create a document-feature matrix

Description

Construct a sparse document-feature matrix from a tokens or dfm object.

Usage

dfm(
  x,
  tolower = TRUE,
  remove_padding = FALSE,
  verbose = quanteda_options("verbose"),
  ...
)

Value

a dfm object

Arguments

x

a tokens or dfm object.

tolower

convert all features to lowercase.

remove_padding

logical; if TRUE, remove the "pads" left as empty tokens after calling tokens() or tokens_remove() with padding = TRUE.

verbose

display messages if TRUE.

...

not used.

Changes in version 3

In quanteda v4, many convenience functions formerly available in dfm() were removed.

See Also

as.dfm(), dfm_select(), dfm

Examples

Run this code
## for a corpus
toks <- data_corpus_inaugural |>
  corpus_subset(Year > 1980) |>
  tokens()
dfm(toks)

# removal options
toks <- tokens(c("a b c", "A B C D")) |>
    tokens_remove("b", padding = TRUE)
toks
dfm(toks)
dfm(toks) |>
 dfm_remove(pattern = "") # remove "pads"

# preserving case
dfm(toks, tolower = FALSE)

Run the code above in your browser using DataCamp Workspace