Learn R Programming

qtkit (version 1.1.1)

calc_assoc_metrics: Calculate Association Metrics for Bigrams

Description

This function calculates various association metrics (PMI, Dice's Coefficient, G-score) for bigrams in a given corpus.

Usage

calc_assoc_metrics(
  data,
  doc_index,
  token_index,
  type,
  association = "all",
  verbose = FALSE
)

Value

A data frame with one row per bigram and columns for each calculated metric.

Arguments

data

A data frame containing the corpus.

doc_index

Column in 'data' which represents the document index.

token_index

Column in 'data' which represents the token index.

type

Column in 'data' which represents the tokens or terms.

association

A character vector specifying which metrics to calculate. Can be any combination of 'pmi', 'dice_coeff', 'g_score', or 'all'. Default is 'all'.

verbose

A logical value indicating whether to keep the intermediate probability columns. Default is FALSE.

Examples

Run this code
data_path <- system.file("extdata", "bigrams_data.rds", package = "qtkit")
data <- readRDS(data_path)

calc_assoc_metrics(data, doc_index, token_index, type)

Run the code above in your browser using DataLab