cpglm: Compound Poisson Generalized Linear Model

Description

This function fits compound Poisson generalized linear models.

Usage

cpglm(formula, link = "log", data, weights, offset, 
          subset, na.action = NULL, contrasts = NULL, 
          control = list(), chunksize = 0, ...)

Arguments

formula

an object of class formula. See also in glm.

link

a specification for the model link function. This can be either a literal character string or a numeric number. If it is a character string, it must be one of "log", "identity", "sqrt" or "inverse". If it is numeric, it is the same as the link.power

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model.

weights

an optional vector of weights. Should be NULL or a numeric vector. When it is numeric, it must be positive. Zero weights are not allowed in cpglm.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. Another possible value is NULL, no actio

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the for

contrasts

an optional list. See the contrasts.arg.

control

a list of parameters for controling the fitting process. See 'Details' below.

chunksize

an integer that indicates the size of chunks for processing the data frame as used in bigglm. When it is greater than 0, bigglm is used to fit the GLM, where large data sets ca

...

additional arguments to be passed to bigglm. Not used when chunksize = 0. The maxit arguments defaults to 50 in cpglm if not specified.

Value

cpglm returns an object of class "cpglm". See cpglm-class for details of the return values as well as various methods available for this class.

Details

This function implements the profile likelihood approach in the Tweedie compound Poisson generalized linear models. For parameter estimation, the index and the dispersion parameters are estimated by maximizing the profile likelihood first and then the mean parameters are estimated using a GLM with the above-estimated index parameter. To compute the profile likelihood, one has to resort to numerical methods provided in the tweedie package for approximating the density of the compound Poisson distribution. Indeed, the function tweedie.profile in that package makes available the profile likelihood approach. The function here differs from tweedie.profile in two aspects. First, the user does not need to specify the grid of possible values the index parameter can take. Instead, the optimization of the profile likelihood is automated. Second, big data sets can be handled where the bigglm function from the biglm package is used in fitting GLMs. The bigglm is invoked when the argument chunksize is greater than 0. It is also to be noted that only MLE estimate for $\phi$ is included here, while tweedie.profile provides several other possibilities. The package used to implement a second approach using the Monte Carlo EM algorithm, but it is now removed because it does not offer obvious advantages over the profile likelihood approach for this model. The control argument has the following components: [object Object],[object Object],[object Object]

References

Dunn, P.K. and Smyth, G.K. (2005). Series evaluation of Tweedie exponential dispersion models densities. Statistics and Computing, 15, 267-280.

Examples

Run this code

fit1 <- cpglm(RLD ~ factor(Zone) * factor(Stock),
  data = fineroot)
     
# residual and qq plot
parold <- par(mfrow = c(2,2), mar = c(5,5,2,1))
# 1. regular plot
r1 <- resid(fit1) / sqrt(fit1$phi)
plot(r1 ~ fitted(fit1), cex = 0.5)
qqnorm(r1, cex = 0.5)
# 2. quantile residual plot to avoid overlapping
u <- ptweedie(fit1$y, fit1$p, fitted(fit1), fit1$phi)
u[fit1$y == 0] <- runif(sum(fit1$y == 0), 0, u[fit1$y == 0])
r2 <- qnorm(u)
plot(r2 ~ fitted(fit1), cex = 0.5)
qqnorm(r2, cex = 0.5)
par(parold)

# use bigglm 
fit2 <- cpglm(RLD ~ factor(Zone), 
  data=fineroot, chunksize = 250)

Run the code above in your browser using DataLab

Get 50% off unlimited learning