tm (version 0.7-4)

Source: Sources

Description

Creating and accessing sources.

Usage

SimpleSource(encoding = "",
             length = 0,
             position = 0,
             reader = readPlain,
             …,
             class)
getSources()
# S3 method for SimpleSource
close(con, …)
# S3 method for SimpleSource
eoi(x)
# S3 method for DataframeSource
getMeta(x)
# S3 method for DataframeSource
getElem(x)
# S3 method for DirSource
getElem(x)
# S3 method for URISource
getElem(x)
# S3 method for VectorSource
getElem(x)
# S3 method for XMLSource
getElem(x)
# S3 method for SimpleSource
length(x)
# S3 method for SimpleSource
open(con, …)
# S3 method for DataframeSource
pGetElem(x)
# S3 method for DirSource
pGetElem(x)
# S3 method for URISource
pGetElem(x)
# S3 method for VectorSource
pGetElem(x)
# S3 method for SimpleSource
reader(x)
# S3 method for SimpleSource
stepNext(x)

Arguments

x

A Source.

con

A Source.

encoding

a character giving the encoding of the elements delivered by the source.

length

a non-negative integer denoting the number of elements delivered by the source. If the length is unknown in advance set it to 0.

position

a numeric indicating the current position in the source.

reader

a reader function (generator).

For SimpleSource tag-value pairs for storing additional information; not used otherwise.

class

a character vector giving additional classes to be used for the created source.

Value

For SimpleSource, an object inheriting from class, SimpleSource, and Source.

For getSources, a character vector with sources provided by package tm.

open and close return the opened and closed source, respectively.

For eoi, a logical indicating if the end of input of the source is reached.

For getElem a named list with the components content holding the document and uri giving a uniform resource identifier (e.g., a file path or URL; NULL if not applicable or unavailable). For pGetElem a list of such named lists.

For length, an integer for the number of elements.

For reader, a function for the default reader.

Details

Sources abstract input locations, like a directory, a connection, or simply an R vector, in order to acquire content in a uniform way. In packages which employ the infrastructure provided by package tm, such sources are represented via the virtual S3 class Source: such packages then provide S3 source classes extending the virtual base class (such as DirSource provided by package tm itself).

All extension classes must provide implementations for the functions close, eoi, getElem, length, open, reader, and stepNext. For parallel element access the (optional) function pGetElem must be provided as well. If document level metadata is available, the (optional) function getMeta must be implemented.

The functions open and close open and close the source, respectively. eoi indicates end of input. getElem fetches the element at the current position, whereas pGetElem retrieves all elements in parallel at once. The function length gives the number of elements. reader returns a default reader for processing elements. stepNext increases the position in the source to acquire the next element.

The function SimpleSource provides a simple reference implementation and can be used when creating custom sources.

See Also

DataframeSource, DirSource, URISource, VectorSource, and XMLSource.