Helper function that calculates joint and marginal probabilities for bigrams in the input data using dplyr. It processes the data to create bigrams and computes their probabilities along with individual token probabilities.
calculate_bigram_probabilities(data, doc_index, token_index, type)
A data frame containing:
x: First token in bigram
y: Second token in bigram
p_xy: Joint probability of the bigram
p_x: Marginal probability of first token
p_y: Marginal probability of second token
A data frame containing the corpus
Column name for document index
Column name for token position
Column name for the actual tokens/terms