grammar_probs: Posterior Probabilities of Grammatical Types for Each Sign
Description
For each cuneiform sign in a sentence, computes Bayesian posterior
probabilities for all grammatical types, combining prior beliefs from
prior_probs with observed dictionary frequencies. The
dictionary counts are corrected for verb underrepresentation using the
sentence_prob stored in the prior.
Usage
grammar_probs(sg, prior, dic, alpha0 = 1)
Value
A data frame with columns:
position
Integer. Position of the sign in the sentence.
sign_name
Character. The sign name.
cuneiform
Character. The cuneiform character.
type
Character. The grammar type (e.g., "S", "V",
"Sx->S").
prob
Numeric. Posterior probability for this type at this
position.
n
Numeric. Number of counts in the dictionary.
Arguments
sg
A data frame as returned by sign_grammar.
prior
A named numeric vector as returned by
prior_probs, with a sentence_prob attribute.
dic
A dictionary data frame as returned by
read_dictionary.
alpha0
Numeric (>= 0). Strength of the prior (pseudo sample
size). Larger values pull the posterior towards the prior. When
alpha0 = 0, the result is purely data-driven. Default: 1.
Details
For each sign at position \(i\) in the sentence, the function computes:
The raw dictionary counts \(n_k\) for each grammar type \(k\).
A correction factor \(x_k = 1 / \mathrm{sentence\_prob}\) for
verb-like types, \(x_k = 1\) otherwise. The corrected counts are
\(m_k = n_k \cdot x_k\) with total \(M = \sum_k m_k\).
The posterior probability (Dirichlet-Multinomial model):
$$\theta_k = \frac{\alpha_0 \, p_k + m_k}{\alpha_0 + M}$$
where \(p_k\) is the prior probability from prior_probs().
For signs not in the dictionary (\(M = 0\)), the posterior equals the
prior. For signs with many observations (\(M \gg \alpha_0\)), the
posterior is dominated by the data.
See Also
prior_probs for computing the prior,
sign_grammar for the input data,
plot_sign_grammar for visualisation.