Learn R Programming

lingmatch (version 1.0.7)

lma_process: Process Text

Description

A wrapper to other pre-processing functions, potentially from read.segments, to lma_dtm or lma_patcat, to lma_weight, then lma_termcat or lma_lspace, and optionally including lma_meta output.

Usage

lma_process(input = NULL, ..., meta = TRUE, coverage = FALSE)

Value

A matrix with texts represented by rows, and features in columns, unless there are multiple rows per output (e.g., when a latent semantic space is applied without terms being mapped) in which case only the special output is returned (e.g., a matrix with terms as rows and latent dimensions in columns).

Arguments

input

A vector of text, or path to a text file or folder.

...

arguments to be passed to lma_dtm, lma_patcat, lma_weight, lma_termcat, and/or lma_lspace. All arguments must be named.

meta

Logical; if FALSE, metastatistics are not included. Only applies when raw text is available. If included, meta categories are added as the last columns, with names starting with "meta_".

coverage

Logical; if TRUE and a dictionary is provided (dict), will calculate the coverage (number of unique term matches) of each dictionary category.

See Also

If you just want to compare texts, see the lingmatch() function.

Examples

Run this code
# starting with some texts in a vector
texts <- c(
  "Firstly, I would like to say, and with all due respect...",
  "Please, proceed. I hope you feel you can speak freely...",
  "Oh, of course, I just hope to be clear, and not cause offense...",
  "Oh, no, don't monitor yourself on my account..."
)

# by default, term counts and metastatistics are returned
lma_process(texts)

# add dictionary and percent arguments for standard dictionary-based results
lma_process(texts, dict = lma_dict(), percent = TRUE)

# add space and weight arguments for standard word-centroid vectors
lma_process(texts, space = lma_lspace(texts), weight = "tfidf")

Run the code above in your browser using DataLab