tm (version 0.7-8)

URISource: Uniform Resource Identifier Source

Description

Create a uniform resource identifier source.

Usage

URISource(x, encoding = "", mode = "text")

Arguments

x

A character vector of uniform resource identifiers (URIs.

encoding

A character string describing the current encoding. It is passed to iconv to convert the input to UTF-8.

mode

a character string specifying if and how URIs should be read in. Available modes are:

""

No read. In this case getElem and pGetElem only deliver URIs.

"binary"

URIs are read in binary raw mode (via readBin).

"text"

URIs are read as text (via readLines).

Value

An object inheriting from URISource, SimpleSource, and Source.

Details

A uniform resource identifier source interprets each URI as a document.

See Also

Source for basic information on the source infrastructure employed by package tm.

Encoding and iconv on encodings.

Examples

# NOT RUN {
loremipsum <- system.file("texts", "loremipsum.txt", package = "tm")
ovid <- system.file("texts", "txt", "ovid_1.txt", package = "tm")
us <- URISource(sprintf("file://%s", c(loremipsum, ovid)))
inspect(VCorpus(us))
# }