prior_probs: Prior Probabilities of Grammatical Types
Description
Computes prior probabilities for each grammatical type (e.g., S,
V, Sx->S, xS->A, etc.) from a dictionary. The priors
can be corrected for verb underrepresentation in the dictionary data.
Usage
prior_probs(dic, sentence_prob = 1.0)
Value
A named numeric vector with one element per grammatical type found in
the dictionary, summing to 1. The names are the type strings as they
appear in the dictionary (e.g., "S", "V", "Sx->S").
The sentence_prob parameter is stored as an attribute.
Arguments
dic
A dictionary data frame as returned by
read_dictionary.
sentence_prob
Numeric in (0, 1]. The estimated proportion of
complete sentences (as opposed to noun phrases) in the training data
from which the dictionary was created. Verbs appear in complete
sentences, so a value less than 1 upweights verb-like types.
Default: 1.0.
Details
The function proceeds in three steps:
For each single-sign dictionary entry with at least one count,
the counts per grammatical type are normalised to sum to 1.
The prior probability of each type is the mean of these
normalised frequencies across all signs.
A correction is applied: counts of verb-like types (V and all
operators with return type V, such as Vx->V or
xV->V) are multiplied by 1/sentence_prob, then all
probabilities are renormalised. This compensates for the fact that
verbs are underrepresented when most dictionary entries are obtained from noun
phrases rather than complete sentences.
When sentence_prob = 1, no correction is applied.
See Also
sign_grammar for per-sign grammatical type frequencies.