Fletcher.chat: Estimate overdispersion

Description

General function for estimating a variance inflation factor ($\hat c$) from observed counts.

Usage

Fletcher.chat (observed, expected, np, verbose = TRUE, 
    type = c('Fletcher', 'Wedderburn', 'both'), multinomial = FALSE)

Value

Output depends on `verbose', `observed' and `type':

-- if `observed' is a list of nk vectors (usually generated by simulation) then the output is a vector of (Fletcher or Wedderburn) $\hat c$ values, one element for each component of `observed', unless type = "both" when the output is a list of two such vectors. Argument `verbose' is ignored.

-- if `observed' is a simple vector then `verbose' output is a list comprising input values, various summary statistics, and the computed Fletcher overdispersion (`chat'). The statistic `cX2' is the conventional variance inflation factor of Wedderburn (1974) -- $X^2/df$. For verbose = FALSE, a single estimate of $\hat c$ is returned when type = "Fletcher" or type = "Wedderburn", otherwise a vector of the two estimates.

Arguments

observed: integer vector of observed counts, or a list of such vectors
expected: numeric vector of expected counts
np: integer number of parameters estimated
verbose: logical; if TRUE returns extended output
type: character
multinomial: logical; if TRUE, one df is subtracted for the constraint

Details

Fletcher.chat applies the overdispersion formula of Fletcher (2012) or computes the conventional (Wedderburn 1974) variance inflation factor $X^2/df$. It is used by chat.nj.

A conventional variance inflation factor due to Wedderburn (1974) is $\hat c_X = X^2/(K-p)$ where $K$ is the number of detectors, $p$ is the number of estimated parameters, and $$X^2 = \sum_k (n_k - E (n_k))^2/ E(n_k).$$

Fletcher's $\hat c$ is an improvement on $\hat c_X$ that is less affected by small expected counts. It is defined by $$\hat c = c_X / (1+ \bar s),$$ where $\bar s = \sum_k s_k / K$ and $s_k = (n_k - E(n_k)) / E(n_k)$.

The inputs `observed' and `expected' are vectors of counts (e.g., number of distinct individuals per detector); `observed' may also be a list of such vectors, possibly simulated.

References

Fletcher, D. (2012) Estimating overdispersion when fitting a generalized linear model to sparse data. Biometrika 99, 230--237.