Learn R Programming

lingmatch (version 1.0.7)

report_term_matches: Generate a Report of Term Matches

Description

Extract matches to fuzzy terms (globs/wildcards or regular expressions) from provided text, in order to assess their appropriateness for inclusion in a dictionary.

Usage

report_term_matches(dict, text = NULL, space = NULL, glob = TRUE,
  parse_phrases = TRUE, tolower = TRUE, punct = TRUE, special = TRUE,
  as_terms = FALSE, bysentence = FALSE, as_string = TRUE,
  term_map_freq = 1, term_map_spaces = 1, outFile = NULL,
  space_dir = getOption("lingmatch.lspace.dir"), verbose = TRUE)

Value

A data.frame of results, with a row for each unique term, and the following columns:

  • term: The originally entered term.

  • regex: The converted and applied regular expression form of the term.

  • categories: Comma-separated category names, if dict is a list with named entries.

  • count: Total number of matches to the term.

  • max_count: Number of matches to the most representative (that with the highest average similarity) variant of the term.

  • variants: Number of variants of the term.

  • space: Name of the latent semantic space, if one was used.

  • mean_sim: Average similarity to the most representative variant among terms found in the space, if one was used.

  • min_sim: Minimal similarity to the most representative variant.

  • matches: Variants, with counts and similarity (Pearson's r) to the most representative term (if a space was specified). Either in the form of a comma-separated string or a data.frame (if as_string is FALSE).

Arguments

dict

A vector of terms, list of such vectors, or a matrix-like object to be categorized by read.dic.

text

A vector of text to extract matches from. If not specified, will use the terms in the term_map retrieved from select.lspace.

space

A vector space used to calculate similarities between term matches. Name of a the space (see select.lspace), a matrix with terms as row names, or TRUE to auto-select a space based on matched terms.

glob

Logical; if TRUE, converts globs (asterisk wildcards) to regular expressions. If not specified, this will be set automatically.

parse_phrases

Logical; if TRUE (default) and space is specified, will break unmatched phrases into single terms, and average across and matched vectors.

tolower

Logical; if FALSE, will retain text's case.

punct

Logical; if FALSE, will remove punctuation markings in text.

special

Logical; if FALSE, will attempt to replace special characters in text.

as_terms

Logical; if TRUE, will treat text as terms, meaning dict terms will only count as matches when matching the complete text.

bysentence

Logical; if TRUE, will split text into sentences, and only consider unique sentences.

as_string

Logical; if FALSE, returns matches as tables rather than a string.

term_map_freq

Proportion of terms to include when using the term map as a source of terms. Applies when text is not specified.

term_map_spaces

Number of spaces in which a term has to appear to be included. Applies when text is not specified.

outFile

File path to write results to, always ending in .csv.

space_dir

Directory from which space should be loaded.

verbose

Logical; if FALSE, will not display status messages.

See Also

For a more complete assessment of dictionaries, see dictionary_meta().

Similar information is provided in the dictionary builder web tool.

Other Dictionary functions: dictionary_meta(), download.dict(), lma_patcat(), lma_termcat(), read.dic(), select.dict()

Examples

Run this code
text <- c(
  "I am sadly homeless, and suffering from depression :(",
  "This wholesome happiness brings joy to my heart! :D:D:D",
  "They are joyous in these fearsome happenings D:",
  "I feel weightless now that my sadness has been depressed! :()"
)
dict <- list(
  sad = c("*less", "sad*", "depres*", ":("),
  happy = c("*some", "happ*", "joy*", "d:"),
  self = c("i *", "my *")
)

report_term_matches(dict, text)

Run the code above in your browser using DataLab