tm (version 0.7-7)

PCorpus: Permanent Corpora

Description

Create permanent corpora.

Usage

PCorpus(x,
        readerControl = list(reader = reader(x), language = "en"),
        dbControl = list(dbName = "", dbType = "DB1"))

Arguments

x

A Source object.

readerControl

a named list of control parameters for reading in content from x.

reader

a function capable of reading in and processing the format delivered by x.

language

a character giving the language (preferably as IETF language tags, see language in package NLP). The default language is assumed to be English ("en").

dbControl

a named list of control parameters for the underlying database storage provided by package filehash.

dbName

a character giving the filename for the database.

dbType

a character giving the database format (see filehashOption for possible database formats).

Value

An object inheriting from PCorpus and Corpus.

Details

A permanent corpus stores documents outside of R in a database. Since multiple PCorpus R objects with the same underlying database can exist simultaneously in memory, changes in one get propagated to all corresponding objects (in contrast to the default R semantics).

See Also

Corpus for basic information on the corpus infrastructure employed by package tm.

VCorpus provides an implementation with volatile storage semantics.

Examples

Run this code
# NOT RUN {
txt <- system.file("texts", "txt", package = "tm")
# }
# NOT RUN {
PCorpus(DirSource(txt),
        dbControl = list(dbName = "pcorpus.db", dbType = "DB1"))
# }

Run the code above in your browser using DataCamp Workspace