Learn R Programming

aifeducation (version 0.3.3)

bow_pp_create_vocab_draft: Function for creating a first draft of a vocabulary This function creates a list of tokens which refer to specific universal part-of-speech tags (UPOS) and provides the corresponding lemmas.

Description

Function for creating a first draft of a vocabulary This function creates a list of tokens which refer to specific universal part-of-speech tags (UPOS) and provides the corresponding lemmas.

Usage

bow_pp_create_vocab_draft(
  path_language_model,
  data,
  upos = c("NOUN", "ADJ", "VERB"),
  label_language_model = NULL,
  language = NULL,
  chunk_size = 100,
  trace = TRUE
)

Value

list with the following components.

  • vocab: data.frame containing the tokens, lemmas, tokens in lower case, and lemmas in lower case.

  • ud_language_model udpipe language model that is used for tagging.

  • label_language_model Label of the udpipe language model.

  • language Language of the raw texts.

  • upos Used univerisal part-of-speech tags.

  • n_sentence int Estimated number of sentences in the raw texts.

  • n_token int Estimated number of tokens in the raw texts.

  • n_document_segments int Estimated number of document segments/raw texts.

Arguments

path_language_model

string Path to a udpipe language model that should be used for tagging and lemmatization.

data

vector containing the raw texts.

upos

vector containing the universal part-of-speech tags which should be used to build the vocabulary.

label_language_model

string Label for the udpipe language model used.

language

string Name of the language (e.g., English, German)

chunk_size

int Number of raw texts which should be processed at once.

trace

bool TRUE if information about the progress should be printed to console.

See Also

Other Preparation: bow_pp_create_basic_text_rep()