tika_json

Character vector describing the paths and/or urls to the input documents.

input

Other parameters to be sent to <code>tika()</code>.

Tika can parse and extract text from almost anything, including zip, tar, tar.bz2, and other archives that contain documents.
 If you have a zip file with 100 text files in it, you can get the text and metadata for each file nested inside of the zip file.
 This recursive output is currently used for the jsonified mode. See: https://wiki.apache.org/tika/RecursiveMetadata 
The document content is XHTML in the "X-TIKA:content" field. 
If <code>output_dir</code> is specified, files will have the <code>.json</code> file extension.

Extract text or metadata from over a thousand file types, using Apache Tika <https://tika.apache.org/>. Get either plain text or structured XHTML content.

Sasha Goodman

rtika

tika_json: Get json Metadata and XHTML Content

Description

Usage

Arguments

Value

Examples