tm (version 0.7-8)

URISource: Uniform Resource Identifier Source

Description

Create a uniform resource identifier source.

Usage

URISource(x, encoding = "", mode = "text")

Value

An object inheriting from URISource, SimpleSource, and Source.

Arguments

x

A character vector of uniform resource identifiers (URIs.

encoding

A character string describing the current encoding. It is passed to iconv to convert the input to UTF-8.

mode

a character string specifying if and how URIs should be read in. Available modes are:

""

No read. In this case getElem and pGetElem only deliver URIs.

"binary"

URIs are read in binary raw mode (via readBin).

"text"

URIs are read as text (via readLines).

Details

A uniform resource identifier source interprets each URI as a document.

See Also

Source for basic information on the source infrastructure employed by package tm.

Encoding and iconv on encodings.

Examples

Run this code
loremipsum <- system.file("texts", "loremipsum.txt", package = "tm")
ovid <- system.file("texts", "txt", "ovid_1.txt", package = "tm")
us <- URISource(sprintf("file://%s", c(loremipsum, ovid)))
inspect(VCorpus(us))

Run the code above in your browser using DataLab