Learn R Programming

R.temis (version 0.1.4)

subset_corpus: subset_corpus

Description

Select documents containing (or not containing) one or more terms.

Usage

subset_corpus(corpus, dtm, terms, exclude = FALSE, all = FALSE)

Value

Corpus object.

Arguments

corpus

A Corpus object.

dtm

A DocumentTermMatrix object corresponding to corpus.

terms

One of more terms appearing in dtm.

exclude

Whether documents containing the terms should be excluded rather than retained.

all

Whether only documents containing all terms should be retained or excluded. By default, documents need to contain at least one of the terms.

Examples

Run this code

file <- system.file("texts", "reut21578-factiva.xml", package="tm.plugin.factiva")
corpus <- import_corpus(file, "factiva", language="en")
dtm <- build_dtm(corpus)
subset_corpus(corpus, dtm, "barrel")
subset_corpus(corpus, dtm, c("barrel", "opec"))
subset_corpus(corpus, dtm, c("barrel", "opec"), exclude=TRUE)
subset_corpus(corpus, dtm, c("barrel", "opec"), all=TRUE)

Run the code above in your browser using DataLab