quanteda (version 1.5.2)

dfm_subset: Extract a subset of a dfm

Description

Returns document subsets of a dfm that meet certain conditions, including direct logical operations on docvars (document-level variables). dfm_subset functions identically to subset.data.frame, using non-standard evaluation to evaluate conditions based on the docvars in the dfm.

Usage

dfm_subset(x, subset, select, ...)

Arguments

x

dfm object to be subsetted

subset

logical expression indicating the documents to keep: missing values are taken as false

select

expression, indicating the docvars to select from the dfm; or a dfm object, in which case the returned dfm will contain the same documents as the original dfm, even if these are empty. See Details.

...

not used

Value

dfm object, with a subset of documents (and docvars) selected according to arguments

Details

To select or subset features, see dfm_select instead.

When select is a dfm, then the returned dfm will be equal in document dimension and order to the dfm used for selection. This is the document-level version of using dfm_select where pattern is a dfm: that function matches features, while dfm_subset will match documents.

See Also

subset.data.frame

Examples

Run this code
# NOT RUN {
corp <- corpus(c(d1 = "a b c d", d2 = "a a b e",
                     d3 = "b b c e", d4 = "e e f a b"),
                   docvars = data.frame(grp = c(1, 1, 2, 3)))
dfmat <- dfm(corp)
# selecting on a docvars condition
dfm_subset(dfmat, grp > 1)
# selecting on a supplied vector
dfm_subset(dfmat, c(TRUE, FALSE, TRUE, FALSE))

# selecting on a dfm
dfmat1 <- dfm(c(d1 = "a b b c", d2 = "b b c d"))
dfmat2 <- dfm(c(d1 = "x y z", d2 = "a b c c d", d3 = "x x x"))
dfm_subset(dfmat1, subset = dfmat2)
dfm_subset(dfmat1, subset = dfmat2[c(3,1,2), ])
# }

Run the code above in your browser using DataLab