tm (version 0.6-2)

Source: Sources


Creating and accessing sources.


SimpleSource(encoding = "", length = 0, position = 0, reader = readPlain, ..., class) getSources() "close"(con, ...) "eoi"(x) "getElem"(x) "getElem"(x) "getElem"(x) "getElem"(x) "getElem"(x) "length"(x) "open"(con, ...) "pGetElem"(x) "pGetElem"(x) "pGetElem"(x) "pGetElem"(x) "reader"(x) "stepNext"(x)


A Source.
A Source.
a character giving the encoding of the elements delivered by the source.
a non-negative integer denoting the number of elements delivered by the source. If the length is unknown in advance set it to 0.
a numeric indicating the current position in the source.
a reader function (generator).
For SimpleSource tag-value pairs for storing additional information; not used otherwise.
a character vector giving additional classes to be used for the created source.


For SimpleSource, an object inheriting from class, SimpleSource, and Source.For getSources, a character vector with sources provided by package and close return the opened and closed source, respectively.For eoi, a logical indicating if the end of input of the source is reached.For getElem a named list with the components content holding the document and uri giving a uniform resource identifier (e.g., a file path or URL; NULL if not applicable or unavailable). For pGetElem a list of such named lists.For length, an integer for the number of elements.For reader, a function for the default reader.


Sources abstract input locations, like a directory, a connection, or simply an R vector, in order to acquire content in a uniform way. In packages which employ the infrastructure provided by package tm, such sources are represented via the virtual S3 class Source: such packages then provide S3 source classes extending the virtual base class (such as DirSource provided by package tm itself).

All extension classes must provide implementations for the functions close, eoi, getElem, length, open, reader, and stepNext. For parallel element access the function pGetElem must be provided as well.

The functions open and close open and close the source, respectively. eoi indicates end of input. getElem fetches the element at the current position, whereas pGetElem retrieves all elements in parallel at once. The function length gives the number of elements. reader returns a default reader for processing elements. stepNext increases the position in the source to acquire the next element.

The function SimpleSource provides a simple reference implementation and can be used when creating custom sources.

See Also

DataframeSource, DirSource, URISource, VectorSource, and XMLSource.