lma_process: Process Text

Description

A wrapper to other pre-processing functions, potentially from read.segments, to lma_dtm or lma_patcat, to lma_weight, then lma_termcat or lma_lspace, and optionally including lma_meta output.

Usage

lma_process(input = NULL, ..., meta = TRUE, coverage = FALSE)

Value

A matrix with texts represented by rows, and features in columns, unless there are multiple rows per output (e.g., when a latent semantic space is applied without terms being mapped) in which case only the special output is returned (e.g., a matrix with terms as rows and latent dimensions in columns).

Arguments

input: A vector of text, or path to a text file or folder.
...: arguments to be passed to lma_dtm, lma_patcat, lma_weight, lma_termcat, and/or lma_lspace. All arguments must be named.
meta: Logical; if FALSE, metastatistics are not included. Only applies when raw text is available. If included, meta categories are added as the last columns, with names starting with "meta_".
coverage: Logical; if TRUE and a dictionary is provided (dict), will calculate the coverage (number of unique term matches) of each dictionary category.

Examples

Run this code

# starting with some texts in a vector
texts <- c(
  "Firstly, I would like to say, and with all due respect...",
  "Please, proceed. I hope you feel you can speak freely...",
  "Oh, of course, I just hope to be clear, and not cause offense...",
  "Oh, no, don't monitor yourself on my account..."
)

# by default, term counts and metastatistics are returned
lma_process(texts)

# add dictionary and percent arguments for standard dictionary-based results
lma_process(texts, dict = lma_dict(), percent = TRUE)

# add space and weight arguments for standard word-centroid vectors
lma_process(texts, space = lma_lspace(texts), weight = "tfidf")

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples