Learn R Programming

textreg (version 0.1.5)

build.corpus: Build a corpus that can be used in the textreg call.

Description

Pre-building a corpus allows for calling multiple textregs without doing a lot of initial data processing (e.g., if you want to explore different ban lists or regularization parameters)

Usage

build.corpus(corpus, labeling, banned = NULL, verbosity = 1,
  token.type = "word")

Arguments

corpus

A list of strings or a corpus from the tm package.

labeling

A vector of +1/-1 or TRUE/FALSE indicating which documents are considered relevant and which are baseline. The +1/-1 can contain 0 whcih means drop the document.

banned

List of words that should be dropped from consideration.

verbosity

Level of output. 0 is no printed output.

token.type

"word" or "character" as tokens.

Value

A textreg.corpus object.

Details

See the bathtub vignette for more complete discussion of this method and the options you might pass to it.

A textreg.corpus object is not a tm-style corpus. In particular, all text pre-processing, etc., to text should be done to the data before building the textreg.corpus object.

Examples

Run this code
# NOT RUN {
data( testCorpora )
textreg( testCorpora$testI$corpus, testCorpora$testI$labelI, c(), C=1, verbosity=1 )
# }

Run the code above in your browser using DataLab