epiphy (version 0.3.4)

fit_two_distr: Maximum likelihood fitting of two distributions and goodness-of-fit comparison.

Description

Different distributions may be used depending on the kind of provided data. By default, the Poisson and negative binomial distributions are fitted to count data, whereas the binomial and beta-binomial distributions are used with incidence data. Either Randomness assumption (Poisson or binomial distributions) or aggregation assumption (negative binomial or beta-binomial) are made, and then, a goodness-of-fit comparison of both distributions is made using a log-likelihood ratio test.

Usage

fit_two_distr(data, ...)

# S3 method for default fit_two_distr(data, random, aggregated, ...)

# S3 method for count fit_two_distr(data, random = smle_pois, aggregated = smle_nbinom, n_est = c(random = 1, aggregated = 2), ...)

# S3 method for incidence fit_two_distr(data, random = smle_binom, aggregated = smle_betabinom, n_est = c(random = 1, aggregated = 2), ...)

Arguments

data

An intensity object.

...

Additional arguments to be passed to other methods.

random

Distribution to describe random patterns.

aggregated

Distribution to describe aggregated patterns.

n_est

Number of estimated parameters for both distributions.

Value

An object of class fit_two_distr, which is a list containing at least the following components:

call The function call.
name The names of both distributions.
model The outputs of fitting process for both distributions.
llr The result of the log-likelihood ratio test.

Other components can be present such as:

param A numeric matrix of estimated parameters (that can be printed using printCoefmat).
freq A data frame or a matrix with the observed and expected frequencies for both distributions for the different categories.
gof Goodness-of-fit tests for both distributions (which are typically chi-squared goodness-of-fit tests).

Details

Under the hood, distr_fit relies on the smle utility which is a wrapped around the optim procedure.

Note that there may appear warnings about chi-squared goodness-of-fit tests if any expected count is less than 5 (Cochran's rule of thumb).

References

Madden LV, Hughes G. 1995. Plant disease incidence: Distributions, heterogeneity, and temporal analysis. Annual Review of Phytopathology 33(1): 529<U+2013>564. doi:10.1146/annurev.py.33.090195.002525

Examples

Run this code
# NOT RUN {
# Simple workflow for incidence data:
my_data <- count(arthropods)
my_data <- split(my_data, by = "t")[[3]]
my_res  <- fit_two_distr(my_data)
summary(my_res)
plot(my_res)

# Simple workflow for incidence data:
my_data <- incidence(tobacco_viruses)
my_res  <- fit_two_distr(my_data)
summary(my_res)
plot(my_res)

# Note that there are other methods to fit some common distributions.
# For example for the Poisson distribution, one can use glm:
my_arthropods <- arthropods[arthropods$t == 3, ]
my_model <- glm(my_arthropods$i ~ 1, family = poisson)
lambda <- exp(coef(my_model)[[1]]) # unique(my_model$fitted.values) works also.
lambda
# ... or the fitdistr function in MASS package:
require(MASS)
fitdistr(my_arthropods$i, "poisson")

# For the binomial distribution, glm still works:
my_model <- with(tobacco_viruses, glm(i/n ~ 1, family = binomial, weights = n))
prob <- logit(coef(my_model)[[1]], rev = TRUE)
prob
# ... but the binomial distribution is not yet recognized by MASS::fitdistr.

# Examples featured in Madden et al. (2007).
# p. 242-243
my_data <- incidence(dogwood_anthracnose)
my_data <- split(my_data, by = "t")
my_fit_two_distr <- lapply(my_data, fit_two_distr)
lapply(my_fit_two_distr, function(x) x$param$aggregated[c("prob", "theta"), ])
lapply(my_fit_two_distr, plot)

my_agg_index <- lapply(my_data, agg_index)
lapply(my_agg_index, function(x) x$index)
lapply(my_agg_index, chisq.test)

# }

Run the code above in your browser using DataCamp Workspace