split_documents

Split documents in a corpus into documents of one of more paragraphs.

An integrated solution to perform
a series of text mining tasks such as importing and cleaning a corpus, and
analyses like terms and documents counts, lexical summary, terms
co-occurrences and documents similarity measures, graphs of terms,
correspondence analysis and hierarchical clustering. Corpora can be imported
from spreadsheet-like files, directories of raw text files,
as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.

Milan Bouchet-Valat

R.temis

Integrated Text Mining Solution

Gilles Bastin

Antoine Chollet

split_documents function

<dl><dt>corpus</dt>
<dd>A <code>Corpus</code> object.</dd>
<dt>chunksize</dt>
<dd>The number of paragraphs each new document should contain at most.</dd>
<dt>preserveMetadata</dt>
<dd>Whether to preserve the meta-data of original documents.</dd></dl>

Arguments

split_documents — split_documents

<dl>

<dt>corpus</dt>
<dd>A <code>Corpus</code> object.</dd>


<dt>chunksize</dt>
<dd>The number of paragraphs each new document should contain at most.</dd>


<dt>preserveMetadata</dt>
<dd>Whether to preserve the meta-data of original documents.</dd>

</dl>

split_documents: split_documents

Description

Usage

Value

Arguments

Examples