tm (version 0.6-2)

PCorpus: Permanent Corpora

Description

Create permanent corpora.

Usage

PCorpus(x, readerControl = list(reader = reader(x), language = "en"), dbControl = list(dbName = "", dbType = "DB1"))

Arguments

x
A Source object.
readerControl
a named list of control parameters for reading in content from x.
reader
a function capable of reading in and processing the format delivered by x.

language
a character giving the language (preferably as IETF language tags, see language in package NLP). The default language is assumed to be English ("en").

dbControl
a named list of control parameters for the underlying database storage provided by package filehash.
dbName
a character giving the filename for the database.

dbType
a character giving the database format (see filehashOption for possible database formats).

Value

An object inheriting from PCorpus and Corpus.

Details

A permanent corpus stores documents outside of R in a database. Since multiple PCorpus R objects with the same underlying database can exist simultaneously in memory, changes in one get propagated to all corresponding objects (in contrast to the default R semantics).

See Also

Corpus for basic information on the corpus infrastructure employed by package tm.

VCorpus provides an implementation with volatile storage semantics.

Examples

txt <- system.file("texts", "txt", package = "tm")
## Not run: PCorpus(DirSource(txt),
#         dbControl = list(dbName = "pcorpus.db", dbType = "DB1"))## End(Not run)