Learn R Programming

tm.plugin.dc (version 0.1-7)

dc_storage: Virtual Storage Class

Description

When using class DistributedCorpus the underlying virtual storage plays and important role. It defines how to use the given storage (read/write methods, etc.), where the data is to be stored (i.e., the base directory on the file system), and how transformations as well as term-document matrix construction have to be performed.

Usage

dc_storage( x, ... )
`dc_storage<-`( x, value )
dc_storage_create(type = c("local_disk", "HDFS"), base_dir, chunksize = 1024^2)
as.dc_storage( ds, ... )
is.dc_storage( ds )

Arguments

x
A distributed corpus.
value
The new storage of class dc_storage attached to the distributed corpus.
type
The type of the storage to be created. Currently only local_disk and HDFS storage types are supported.
base_dir
specifies the base directory where distributed corpus data is to be stored.
chunksize
defines the size of each chunk written to the virtual storage.
ds
A virtual storage.
...
Further arguments to the corresponding methods

Value

  • An object which inherits from class dc_storage, or, in case of is.dc_storage() a logical indicating whether it inherits from dc_storage or not.

Examples

Run this code
## extract storage from 'DistributedCorpus'
data(crude)
dc <- as.DistributedCorpus( crude )
dc_storage( dc )
## creating a new storage using 50MB chunks
dcs <- dc_storage_create(type = "local_disk", base_dir = tempfile(),
chunksize = 50 * 1024^2)
is.dc_storage( dcs )

Run the code above in your browser using DataLab