PCorpus
From tm v0.6-2
by Ingo Feinerer
Permanent Corpora
Create permanent corpora.
Usage
PCorpus(x, readerControl = list(reader = reader(x), language = "en"), dbControl = list(dbName = "", dbType = "DB1"))
Arguments
- x
- A
Source
object. - readerControl
- a named list of control parameters for reading in content
from
x
.reader
- a function capable of reading in and processing the
format delivered by
x
. language
- a character giving the language (preferably as
IETF language tags, see language in
package NLP).
The default language is assumed to be English (
"en"
).
- dbControl
- a named list of control parameters for the underlying
database storage provided by package filehash.
dbName
- a character giving the filename for the database.
dbType
- a character giving the database format (see
filehashOption
for possible database formats).
Details
A permanent corpus stores documents outside of R in a database. Since
multiple PCorpus
R objects with the same underlying database can
exist simultaneously in memory, changes in one get propagated to all
corresponding objects (in contrast to the default R semantics).
Value
-
An object inheriting from
PCorpus
and Corpus
.
See Also
Corpus
for basic information on the corpus infrastructure
employed by package tm.
VCorpus
provides an implementation with volatile storage
semantics.
Examples
txt <- system.file("texts", "txt", package = "tm")
## Not run: PCorpus(DirSource(txt),
# dbControl = list(dbName = "pcorpus.db", dbType = "DB1"))## End(Not run)
Community examples
Looks like there are no examples yet.