grammar_probs: Posterior Probabilities of Grammatical Types for Each Sign

Description

For each cuneiform sign in a sentence, computes Bayesian posterior probabilities for all grammatical types, combining prior beliefs from prior_probs with observed dictionary frequencies. The dictionary counts are corrected for verb underrepresentation using the sentence_prob stored in the prior.

Usage

grammar_probs(sg, prior, dic, alpha0 = 1)

Value

A data frame with columns:

position: Integer. Position of the sign in the sentence.
sign_name: Character. The sign name.
cuneiform: Character. The cuneiform character.
type: Character. The grammar type (e.g., "S", "V", "Sx->S").
prob: Numeric. Posterior probability for this type at this position.
n: Numeric. Number of counts in the dictionary.

Arguments

sg: A data frame as returned by sign_grammar.
prior: A named numeric vector as returned by prior_probs, with a sentence_prob attribute.
dic: A dictionary data frame as returned by read_dictionary.
alpha0: Numeric (>= 0). Strength of the prior (pseudo sample size). Larger values pull the posterior towards the prior. When alpha0 = 0, the result is purely data-driven. Default: 1.

Details

For each sign at position $i$ in the sentence, the function computes:

The raw dictionary counts $n_k$ for each grammar type $k$.
A correction factor $x_k = 1 / \mathrm{sentence\_prob}$ for verb-like types, $x_k = 1$ otherwise. The corrected counts are $m_k = n_k \cdot x_k$ with total $M = \sum_k m_k$.
The posterior probability (Dirichlet-Multinomial model): $$\theta_k = \frac{\alpha_0 \, p_k + m_k}{\alpha_0 + M}$$ where $p_k$ is the prior probability from prior_probs().

For signs not in the dictionary ($M = 0$), the posterior equals the prior. For signs with many observations ($M \gg \alpha_0$), the posterior is dominated by the data.

Examples

Run this code

dic   <- read_dictionary()
sg    <- sign_grammar("a-ma-ru ba-ur3 ra", dic)
prior <- prior_probs(dic, sentence_prob = 0.25)
gp    <- grammar_probs(sg, prior, dic, alpha0 = 1)
print(gp)