Function for creating a first draft of a vocabulary This function creates a list of tokens which refer to specific universal part-of-speech tags (UPOS) and provides the corresponding lemmas.
bow_pp_create_vocab_draft(
path_language_model,
data,
upos = c("NOUN", "ADJ", "VERB"),
label_language_model = NULL,
language = NULL,
chunk_size = 100,
trace = TRUE
)list with the following components.
vocab: data.frame containing the tokens, lemmas, tokens in lower case, and
lemmas in lower case.
ud_language_model udpipe language model that is used for tagging.
label_language_model Label of the udpipe language model.
language Language of the raw texts.
upos Used univerisal part-of-speech tags.
n_sentence int Estimated number of sentences in the raw texts.
n_token int Estimated number of tokens in the raw texts.
n_document_segments int Estimated number of document segments/raw texts.
string Path to a udpipe language model that
should be used for tagging and lemmatization.
vector containing the raw texts.
vector containing the universal part-of-speech tags which
should be used to build the vocabulary.
string Label for the udpipe language model used.
string Name of the language (e.g., English, German)
int Number of raw texts which should be processed at once.
bool TRUE if information about the progress should be printed to console.
Other Preparation:
bow_pp_create_basic_text_rep()