Learn R Programming

sumer (version 1.3.0)

grammar_probs: Posterior Probabilities of Grammatical Types for Each Sign

Description

For each cuneiform sign in a sentence, computes Bayesian posterior probabilities for all grammatical types, combining prior beliefs from prior_probs with observed dictionary frequencies. The dictionary counts are corrected for verb underrepresentation using the sentence_prob stored in the prior.

Usage

grammar_probs(sg, prior, dic, alpha0 = 1)

Value

A data frame with columns:

position

Integer. Position of the sign in the sentence.

sign_name

Character. The sign name.

cuneiform

Character. The cuneiform character.

type

Character. The grammar type (e.g., "S", "V", "Sx->S").

prob

Numeric. Posterior probability for this type at this position.

n

Numeric. Number of counts in the dictionary.

Arguments

sg

A data frame as returned by sign_grammar.

prior

A named numeric vector as returned by prior_probs, with a sentence_prob attribute.

dic

A dictionary data frame as returned by read_dictionary.

alpha0

Numeric (>= 0). Strength of the prior (pseudo sample size). Larger values pull the posterior towards the prior. When alpha0 = 0, the result is purely data-driven. Default: 1.

Details

For each sign at position \(i\) in the sentence, the function computes:

  1. The raw dictionary counts \(n_k\) for each grammar type \(k\).

  2. A correction factor \(x_k = 1 / \mathrm{sentence\_prob}\) for verb-like types, \(x_k = 1\) otherwise. The corrected counts are \(m_k = n_k \cdot x_k\) with total \(M = \sum_k m_k\).

  3. The posterior probability (Dirichlet-Multinomial model): $$\theta_k = \frac{\alpha_0 \, p_k + m_k}{\alpha_0 + M}$$ where \(p_k\) is the prior probability from prior_probs().

For signs not in the dictionary (\(M = 0\)), the posterior equals the prior. For signs with many observations (\(M \gg \alpha_0\)), the posterior is dominated by the data.

See Also

prior_probs for computing the prior, sign_grammar for the input data, plot_sign_grammar for visualisation.

Examples

Run this code
dic   <- read_dictionary()
sg    <- sign_grammar("a-ma-ru ba-ur3 ra", dic)
prior <- prior_probs(dic, sentence_prob = 0.25)
gp    <- grammar_probs(sg, prior, dic, alpha0 = 1)
print(gp)

Run the code above in your browser using DataLab