The goal is to set a prior that takes into account the number of
annotated variants for genes with E exons, as well as the number of exons in
each variant.
Suppose we have a gene with E exons.
Let V_1,...,V_n be n variants of interest and let |V_1|,...,|V_n| be the
corresponding number of exons in each variant.
The prior probability of variants V_1,...,V_n being expressed is modeled
as P(V_1,...,V_n|E)= P(n|E) P(|V_1| |E) ... P(|V_n| |E)
where P(n|E)= NegBinom(n; k_E, r_E) I(0 < n < 2^E) and
P(|V_i| |E)= BetaBinomial(|V_i|-1; E-1, alpha_E, beta_E).
The parameters k_E, r_E, alpha_E, beta_E depend on E (the number of exons
in the gene) and are estimated from the available annotation via
maximum likelihood.
Parameters are estimated jointly for all genes with E>=
maxExons
in order to improve the precision.
For smooth==TRUE
, alpha_E and beta_E are modeled as a smooth
function of E by calling gam
and setting the smoothing
parameter via cross-validation. Estimates for genes with E>=10 are
substituted by their smooth versions, which typically helps improve
stability in the estimates.