Learn R Programming

corpus (version 0.5.1)

term_counts: Term Frequencies

Description

Tokenize a set of texts and tabulate the term occurrence frequencies.

Usage

term_counts(x, filter = text_filter(), weights = NULL)

Arguments

x

a text vector to tokenize.

filter

a text_filter specifying the tokenization rules.

weights

a numeric vector the same length of x assigning weights to each text, or NULL for unit weights.

Value

A data frame with two columns, term and count, with one row for each appearing term. Rows are sorted in descending order according to count, with ties broken arbitrarily.

Details

term_counts tokenizes a set of texts and computes the occurrence counts for each term. If weights is non-NULL, then each token in text i increments the count for the corresponding term by weights[i]; otherwise, each appearance increments the count by one.

See Also

tokens, term_matrix.

Examples

Run this code
    term_counts("A rose is a rose is a rose.")

    # remove punctuation and stop words
    term_counts("A rose is a rose is a rose.",
                text_filter(drop_symbol = TRUE, drop = stopwords("english")))

    # weight the texts
    term_counts(c("A rose is a rose is a rose.",
                  "A Rose is red, a violet is blue!"),
                weights = c(100, 1))

Run the code above in your browser using DataLab