textcat_xdist: Cross-Distances Between $N$-Gram Profiles

Description

Compute cross-distances between collections of $n$-gram profiles.

Usage

textcat_xdist(x, p = NULL, method = "CT", ..., options = list())

Arguments

a textcat profile db (see textcat_profile_db), or an Robject of text documents extractable via as.character.

NULL (default), or as for x. The default is equivalent to taking p as x (but more efficient).

method

a character string specifying a built-in method, or a user-defined function for computing distances between $n$-gram profiles, or NULL (corresponding to the current value of textcat option xdist_method (see

...

options to be passed to the method for computing distances.

options

a list of such options.

Details

If x (or p) is not a profile db, the $n$-gram profiles of the individual text documents extracted from it are computed using the profile method and options in p if this is a profile db, and using the current textcat profile method and options otherwise.

Currently, the following distance methods for $n$-gram profiles are available. [object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object] For the measures based on distances of frequency distributions, $n$-grams of the two profiles are combined, and missing $n$-grams are given a small positive absolute frequency which can be controlled by option eps, and defaults to 1e-6.

Options given in ... and options are combined, and merged with the default xdist options specified by the textcat option xdist_options using exact name matching.

Examples

Run this code

## Compute cross-distances between the TextCat byte profiles using the
## CT out-of-place measure.
d <- textcat_xdist(TC_byte_profiles)
## Visualize results of hierarchical cluster analysis on the distances.
plot(hclust(as.dist(d)), cex = 0.7)

Run the code above in your browser using DataLab