The following lines are extracted from Laio et al. (2008). See the paper for more details and references.Model selection criteria
The problem of model selection can be formalized as follows: a sample of $n$ data, $D=(x_1, \dots, x_n)$, arranged in ascending order is available, sampled from an unknown parent distribution $f(x)$;
$N_m$ operating models, $M_j$, $j=1,\dots, N_m$, are used to represent the data.
The operating models are in the form of probability distributions, $M_j = g_j(x,\hat{\theta})$, with parameters $\hat{\theta}$ estimated from the available data sample $D$.
The scope of model selection is to identify the model $M_{opt}$ which is better suited to represent the data, i.e. the model which is closer in some sense to the parent distribution $f(x)$.
Three different model selection criteria are considered here, namely, the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and the Anderson-Darling Criterion (ADC).
Of the three methods, the first two belong to the category of classical literature approaches, while the third derives from a heuristic interpretation of the results of a standard goodness-of-fit test (see Laio, 2004).
Akalike Information Criterion
The Akaike information Criterion (AIC) for the j-th operational model can be computed as
$$AIC_j = -2 ln (L_j(\hat{\theta})) + 2 p_j$$
where
$$L_j(\hat{\theta}) = \prod_{i=1}^n g_j(x_i, \hat{\theta})$$
is the likelihood function, evaluated at the point $\theta=\hat{\theta}$ corresponding to the maximum likelihood estimator of the parameter vector $\theta$ and $p_j$ is the number of estimated parameter of the j-th operational model.
In practice, after the computation of the $AIC_j$, for all of the operating models, one selects the model with the minimum AIC value, $AIC_{min}$.
When the sample size, $n$, is small, with respect to the number of estimated parameters, $p$, the AIC may perform inadequately. In those cases a second-order variant of AIC, called AICc, should be used:
$$AICc_j = -2 ln (L_j(\hat{\theta})) + 2 p_j (n/(n - p_j - 1))$$
Indicatively, AICc should be used when $n/p < 40$.
Bayesian Information Criterion
The Bayesian Information Criterion (BIC) for the j-th operational model reads
$$BIC_j = -2 ln (L_j(\hat{\theta})) + ln(n) p_j$$
In practical application, after the computation of the $BIC_j$, for all of the operating models, one selects the model with the minimum BIC value, $BIC_{min}$.
Anderson-Darling Criterion
The Anderson-Darling criterion has the form:
$$ADC_j = 0.0403 + 0.116 ((\Delta_{AD,j} - \epsilon_j)/\beta_j)^{(\eta_j/0.851)}$$
if $1.2 \epsilon_j < \Delta_{AD,j}$,
$$ADC_j = [0.0403 + 0.116 ((0.2 \epsilon_j)/\beta_j)^{(\eta_j/0.851)}] (\Delta_{AD,j} - 0.2 \epsilon_j / \epsilon_j)$$
if $1.2 \epsilon_j \ge \Delta_{AD,j}$,
where $\Delta_{AD,j}$ is the discrepancy measure characterizing the criterion, the Anderson-Darling statistic A2
in GOFlaio2004
, and $\epsilon_j$, $\beta_j$ and $\eta_j$ are distribution-dependent coefficients that are tabled by Laio [2004, Tables 3 and 5] for a set of seven distributions commonly employed for the frequency analysis of extreme events.
In practice, after the computation of the $ADC_j$, for all of the operating models, one selects the model with the minimum ADC value, $ADC_{min}$.