corpus-deprecated

term_counts

a token filter specifying the tokenization rules.

filter

a numeric vector the same length of <code>x</code> assigning
 weights to each text, or <code>NULL</code> for unit weights.

weights

an integer vector of n-gram lengths to include, or
 <code>NULL</code> for length-1 n-grams only.

ngrams

a numeric scalar giving the minimum term count to include
 in the output, or <code>NULL</code> for no minimum count.

min_count

a numeric scalar giving the maximum term count to include
 in the output, or <code>NULL</code> for no maximum count.

max_count

a numeric scalar giving the minimum term support to
 include in the output, or <code>NULL</code> for no minimum support.

min_support

a numeric scalar giving the maximum term support to
 include in the output, or <code>NULL</code> for no maximum support.

max_support

a logical value indicating whether to include columns for
 the types that make up the terms.

types

These functions are provided for compatibility with older versions of
 corpus only, and may be defunct as soon as the next release.

misc

internal

Text corpus data analysis, with full support for Unicode.  Functions for reading data from newline-delimited JSON files, for normalizing and tokenizing text, for searching for term occurrences, and for computing term occurrence frequencies (including n-grams).

corpus-deprecated: Deprecated Functions in Package corpus

Description

Usage

Arguments

Details

See Also