Corpus

0th

Percentile

Corpus

Constructs a corpus (= text document collection).

Keywords
methods
Usage
## S3 method for class 'Source':
Corpus(object, readerControl = list(reader = object@DefaultReader,
language = "en_US", load = TRUE), dbControl = list(useDb = FALSE, dbName = "",
dbType = "DB1"), ...)
Arguments
object
A Source object.
readerControl
A list with the named components reader representing a reading function capable of handling the file format found in object, language giving the text's language (preferably in Iso 639-1
dbControl
A list with the named components useDb indicating that database support should be activated, dbName giving the filename holding the sourced out objects (i.e., the database), and dbType holding a valid dat
...
Optional arguments for the reader.
Value

  • An S4 object of class Corpus which extends the class list containing a collection of text documents.

Aliases
  • Corpus
  • Corpus,Source-method
  • coerce,list,Corpus-method
Examples
txt <- system.file("texts", "txt", package = "tm")
(Corpus(DirSource(txt), readerControl = list(reader
= readPlain, language = "en_US", load = TRUE), dbControl = list(useDb =
TRUE, dbName = "oviddb", dbType = "DB1")))
reut21578 <- system.file("texts", "reut21578", package = "tm")
Corpus(DirSource(reut21578), readerControl = list(reader = readReut21578XML, language = "en_US", load = FALSE))
Documentation reproduced from package tm, version 0.3-4.1, License: GPL-2

Community examples

sprasha6 at Nov 25, 2018 tm v0.7-5

## WORD CLOUD example # install.packages('tm') # install.packages('SnowballC') library(tm) library(SnowballC) #load the dataset dataset_original = read.csv(file.choose(), stringsAsFactors = FALSE) corpus = VCorpus(VectorSource(dataset_original$Review)) corpus = tm_map(corpus, content_transformer(tolower)) corpus = tm_map(corpus, removeNumbers) corpus = tm_map(corpus, removePunctuation) corpus = tm_map(corpus, removeWords, stopwords()) corpus = tm_map(corpus, stemDocument) corpus = tm_map(corpus, stripWhitespace) # Creating the Bag of Words model dtm = DocumentTermMatrix(corpus) dtm = removeSparseTerms(dtm, 0.999) dataset = as.data.frame(as.matrix(dtm)) dataset$Liked = dataset_original$Liked # Encoding the target feature as factor dataset$Liked = factor(dataset$Liked, levels = c(0, 1)) #wordCloud library(wordcloud) dtm = DocumentTermMatrix(corpus) dtm = removeSparseTerms(dtm, 0.999) dataset = as.matrix(dtm) v = sort(colSums(dataset),decreasing=TRUE) myNames = names(v) d = data.frame(word=myNames,freq=v) wordcloud(d$word, colors=c(3,4),random.color=FALSE, d$freq, min.freq=80)