powered by
This function calculates various association metrics (PMI, Dice's Coefficient, G-score) for bigrams in a given corpus.
calc_assoc_metrics( data, doc_index, token_index, type, association = "all", verbose = FALSE )
A data frame with one row per bigram and columns for each calculated metric.
A data frame containing the corpus.
Column in 'data' which represents the document index.
Column in 'data' which represents the token index.
Column in 'data' which represents the tokens or terms.
A character vector specifying which metrics to calculate. Can be any combination of 'pmi', 'dice_coeff', 'g_score', or 'all'. Default is 'all'.
A logical value indicating whether to keep the intermediate probability columns. Default is FALSE.
data_path <- system.file("extdata", "bigrams_data.rds", package = "qtkit") data <- readRDS(data_path) calc_assoc_metrics(data, doc_index, token_index, type)
Run the code above in your browser using DataLab