tm (version 0.7-8)

DirSource: Directory Source

Description

Create a directory source.

Usage

DirSource(directory = ".",
          encoding = "",
          pattern = NULL,
          recursive = FALSE,
          ignore.case = FALSE,
          mode = "text")

Value

An object inheriting from DirSource, SimpleSource, and

Source.

Arguments

directory

A character vector of full path names; the default corresponds to the working directory getwd().

encoding

a character string describing the current encoding. It is passed to iconv to convert the input to UTF-8.

pattern

an optional regular expression. Only file names which match the regular expression will be returned.

recursive

logical. Should the listing recurse into directories?

ignore.case

logical. Should pattern-matching be case-insensitive?

mode

a character string specifying if and how files should be read in. Available modes are:

""

No read. In this case getElem and pGetElem only deliver URIs.

"binary"

Files are read in binary raw mode (via readBin).

"text"

Files are read as text (via readLines).

Details

A directory source acquires a list of files via dir and interprets each file as a document.

See Also

Source for basic information on the source infrastructure employed by package tm.

Encoding and iconv on encodings.

Examples

Run this code
DirSource(system.file("texts", "txt", package = "tm"))

Run the code above in your browser using DataCamp Workspace