textcat_xdist(x, p = NULL, method = "CT", ..., options = list())textcat_profile_db),
or an R object of text documents extractable via
as.character.NULL (default), or as for x.
The default is equivalent to taking p as x (but more
efficient).NULL (corresponding to the current value of
textcat option xdist_method (see
textcat_options).
See Details for available built-in methods.x (or p) is not a profile db, the $n$-gram
profiles of the individual text documents extracted from it are
computed using the profile method and options in p if this is a
profile db, and using the current textcat profile method and
options otherwise. Currently, the following distance methods for $n$-gram profiles are available.
"CT":
"ranks":
"ALPD":
"KLI":
"KLJ":
"JS":
"cosine"
"Dice"
For the measures based on distances of frequency distributions,
$n$-grams of the two profiles are combined, and missing
$n$-grams are given a small positive absolute frequency which can
be controlled by option eps, and defaults to 1e-6.
Options given in ... and options are combined, and
merged with the default xdist options specified by the textcat
option xdist_options using exact name matching.
## Compute cross-distances between the TextCat byte profiles using the
## CT out-of-place measure.
d <- textcat_xdist(TC_byte_profiles)
## Visualize results of hierarchical cluster analysis on the distances.
plot(hclust(as.dist(d)), cex = 0.7)
Run the code above in your browser using DataLab