stem.corpus: Step corpus with annotation.

Description

Given a VCorpus of original text, returns a VCorpus of stemmed text with '+' appended to all stemmed words.

Usage

stem.corpus(corpus, verbose = TRUE)

Arguments

corpus

Original text

verbose

True means do progress bar to watch progress.

Details

This is non-optimized code that is expensive to run. First the stemmer chops words. Then this method passes through and adds a "+" to all chopped words, and builds a list of stems. Finally, the method passes through and adds a "+" to all stems found without a suffix.

So, e.g., goblins and goblin will both be "goblin+".

Code based on code from Kevin Wu, UC Berkeley Undergrad Thesis 2014.

Requires, via the tm package, the SnowballC package.

Examples

Run this code

## Not run: 
# library( tm )
# texts <- c("texting goblins the dagger", "text these goblins",
#             "texting 3 goblins appl daggers goblining gobble")
# corpus <- Corpus(VectorSource(texts))
# stemmed_corpus<-stem.corpus(corpus, verbose=FALSE)
# stemmed_corpus[[2]]
# ## End(Not run)

Run the code above in your browser using DataLab