fit_two_distr: Maximum likelihood fitting of two distributions and goodness-of-fit comparison.

Description

Different distributions may be used depending on the kind of provided data. By default, the Poisson and negative binomial distributions are fitted to count data, whereas the binomial and beta-binomial distributions are used with incidence data. Either Randomness assumption (Poisson or binomial distributions) or aggregation assumption (negative binomial or beta-binomial) are made, and then, a goodness-of-fit comparison of both distributions is made using a log-likelihood ratio test.

Usage

fit_two_distr(data, ...)
# S3 method for default
fit_two_distr(data, random, aggregated, ...)
# S3 method for count
fit_two_distr(data, random = smle_pois,
  aggregated = smle_nbinom, n_est = c(random = 1, aggregated = 2), ...)
# S3 method for incidence
fit_two_distr(data, random = smle_binom,
  aggregated = smle_betabinom, n_est = c(random = 1, aggregated = 2), ...)

Arguments

data

An intensity object.

...

Additional arguments to be passed to other methods.

random

Distribution to describe random patterns.

aggregated

Distribution to describe aggregated patterns.

n_est

Number of estimated parameters for both distributions.

Value

An object of class fit_two_distr, which is a list containing at least the following components:

`call`	The function `call`.
`name`	The names of both distributions.
`model`	The outputs of fitting process for both distributions.
`llr`	The result of the log-likelihood ratio test.

Other components can be present such as:

`param`	A numeric matrix of estimated parameters (that can be printed using `printCoefmat`).
`freq`	A data frame or a matrix with the observed and expected frequencies for both distributions for the different categories.
`gof`	Goodness-of-fit tests for both distributions (which are typically chi-squared goodness-of-fit tests).

Details

Under the hood, distr_fit relies on the smle utility which is a wrapped around the optim procedure.

Note that there may appear warnings about chi-squared goodness-of-fit tests if any expected count is less than 5 (Cochran's rule of thumb).

References

Madden LV, Hughes G. 1995. Plant disease incidence: Distributions, heterogeneity, and temporal analysis. Annual Review of Phytopathology 33(1): 529<U+2013>564. doi:10.1146/annurev.py.33.090195.002525

Examples

Run this code

# NOT RUN {
# Simple workflow for incidence data:
my_data <- count(arthropods)
my_data <- split(my_data, by = "t")[[3]]
my_res  <- fit_two_distr(my_data)
summary(my_res)
plot(my_res)

# Simple workflow for incidence data:
my_data <- incidence(tobacco_viruses)
my_res  <- fit_two_distr(my_data)
summary(my_res)
plot(my_res)

# Note that there are other methods to fit some common distributions.
# For example for the Poisson distribution, one can use glm:
my_arthropods <- arthropods[arthropods$t == 3, ]
my_model <- glm(my_arthropods$i ~ 1, family = poisson)
lambda <- exp(coef(my_model)[[1]]) # unique(my_model$fitted.values) works also.
lambda
# ... or the fitdistr function in MASS package:
require(MASS)
fitdistr(my_arthropods$i, "poisson")

# For the binomial distribution, glm still works:
my_model <- with(tobacco_viruses, glm(i/n ~ 1, family = binomial, weights = n))
prob <- logit(coef(my_model)[[1]], rev = TRUE)
prob
# ... but the binomial distribution is not yet recognized by MASS::fitdistr.

# Examples featured in Madden et al. (2007).
# p. 242-243
my_data <- incidence(dogwood_anthracnose)
my_data <- split(my_data, by = "t")
my_fit_two_distr <- lapply(my_data, fit_two_distr)
lapply(my_fit_two_distr, function(x) x$param$aggregated[c("prob", "theta"), ])
lapply(my_fit_two_distr, plot)

my_agg_index <- lapply(my_data, agg_index)
lapply(my_agg_index, function(x) x$index)
lapply(my_agg_index, chisq.test)

# }

Run the code above in your browser using DataLab