prior_probs: Prior Probabilities of Grammatical Types

Description

Computes prior probabilities for each grammatical type (e.g., S, V, Sx->S, xS->A, etc.) from a dictionary. The priors can be corrected for verb underrepresentation in the dictionary data.

Usage

prior_probs(dic, sentence_prob = 1.0)

Value

A named numeric vector with one element per grammatical type found in the dictionary, summing to 1. The names are the type strings as they appear in the dictionary (e.g., "S", "V", "Sx->S"). The sentence_prob parameter is stored as an attribute.

Arguments

dic: A dictionary data frame as returned by read_dictionary.
sentence_prob: Numeric in (0, 1]. The estimated proportion of complete sentences (as opposed to noun phrases) in the training data from which the dictionary was created. Verbs appear in complete sentences, so a value less than 1 upweights verb-like types. Default: 1.0.

Details

The function proceeds in three steps:

For each single-sign dictionary entry with at least one count, the counts per grammatical type are normalised to sum to 1.
The prior probability of each type is the mean of these normalised frequencies across all signs.
A correction is applied: counts of verb-like types (V and all operators with return type V, such as Vx->V or xV->V) are multiplied by 1/sentence_prob, then all probabilities are renormalised. This compensates for the fact that verbs are underrepresented when most dictionary entries are obtained from noun phrases rather than complete sentences.

When sentence_prob = 1, no correction is applied.

Examples

Run this code

dic   <- read_dictionary()

# Default usage
prior_probs(dic)

# Applying correction (only 25% sentences in training data)
prior_probs(dic, sentence_prob = 0.25)