The following formula is used for VGLMs:
  \(-2 \mbox{log-likelihood} + k n_{par}\), where \(n_{par}\) represents the number of parameters
  in the fitted model, and \(k = 2\) for the usual AIC.
  One could assign \(k = \log(n)\) (\(n\) the number of observations)
  for the so-called BIC or SBC (Schwarz's Bayesian criterion).
  This is the function AICvlm().
This code relies on the log-likelihood being defined, and computed,
  for the object.
  When comparing fitted objects, the smaller the AIC, the better the fit.
  The log-likelihood and hence the AIC is only defined up to an additive
  constant.
Any estimated scale parameter (in GLM parlance) is used as one
  parameter.
For VGAMs and CAO the nonlinear effective degrees of freedom for each
  smoothed component is used. This formula is heuristic.
  These are the functions AICvgam() and AICcao().
The finite sample correction is usually recommended when the
  sample size is small or when the number of parameters is large.
  When the sample size is large their difference tends to be negligible.
  The correction is described in Hurvich and Tsai (1989), and is based
  on a (univariate) linear model with normally distributed errors.