quanteda (version 2.1.2)

tokens_subset: Extract a subset of a tokens

Description

Returns document subsets of a tokens that meet certain conditions, including direct logical operations on docvars (document-level variables). tokens_subset functions identically to subset.data.frame(), using non-standard evaluation to evaluate conditions based on the docvars in the tokens.

Usage

tokens_subset(x, subset, ...)

Arguments

x

tokens object to be subsetted

subset

logical expression indicating the documents to keep: missing values are taken as false

...

not used

Value

tokens object, with a subset of documents (and docvars) selected according to arguments

See Also

subset.data.frame()

Examples

Run this code
# NOT RUN {
corp <- corpus(c(d1 = "a b c d", d2 = "a a b e",
                 d3 = "b b c e", d4 = "e e f a b"),
                 docvars = data.frame(grp = c(1, 1, 2, 3)))
toks <- tokens(corp)
# selecting on a docvars condition
tokens_subset(toks, grp > 1)
# selecting on a supplied vector
tokens_subset(toks, c(TRUE, FALSE, TRUE, FALSE))
# }

Run the code above in your browser using DataCamp Workspace