tm (version 0.5-10)

Source: Create and Access Sources

Description

Create and access sources which abstract input locations, like a directory, a connection, or simply an Rvector.

Usage

Source(defaultReader = readPlain, encoding = "unknown", length = NA_integer_,
       names = NA_character_, position = 0, vectorized = TRUE, class)
is.Source(x)
## S3 method for class 'Source':
eoi(x)
## S3 method for class 'DataframeSource':
getElem(x)
## S3 method for class 'DirSource':
getElem(x)
## S3 method for class 'URISource':
getElem(x)
## S3 method for class 'VectorSource':
getElem(x)
## S3 method for class 'XMLSource':
getElem(x)
## S3 method for class 'DataframeSource':
pGetElem(x)
## S3 method for class 'DirSource':
pGetElem(x)
## S3 method for class 'VectorSource':
pGetElem(x)
## S3 method for class 'Source':
stepNext(x)

Arguments

x
a source.
defaultReader
a reader function (generator).
encoding
a character specifying the encoding of the elements delivered by the source.
length
an integer denoting the number of elements delivered by the source. If the length is unknown in advance it must be set to NA_integer_.
names
a character vector giving element names.
position
a numeric indicating the position in the source.
vectorized
a logical indicating the ability for parallel element access.
class
a character vector for extending the class attribute.

Details

The function Source is a constructor and should be used when creating custom sources. Internally a source is represented as a list with the class attribute Source. Custom sources may extend the internal list structure with additional named components.

The function is.Source returns TRUE for a valid source and FALSE otherwise.

Each source must provide implementations for the three interface functions eoi, getElem, and stepNext. If vectorized is set pGetElem must be implemented as well. The function eoi returns TRUE if the end of input of the source is reached. getElem fetches the element at the current position, whereas pGetElem retrieves all elements in parallel at once. Retrieved elements must be encapsulated in a list with the named components content holding the document and uri pointing to the origin of the document (e.g., a file path or a connection; NA if not applicable or unavailable). stepNext increases the position in the source to the next element.

See Also

getSources to list available sources; readPlain; Encoding.