cpglm(formula, link = "log", data, weights, offset, subset, na.action = NULL, contrasts = NULL, control = list(), chunksize = 0, optimizer = "nlminb", ...)formula. See also in glm.
link.power argument in the tweedie function. The default is link = "log".
as.data.frame to a data frame) containing the variables in the model.
NULL or a numeric vector. When it is numeric, it must be positive. Zero weights are not allowed in cpglm.
NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. Another possible value is NULL, no action. Value na.exclude can be useful.
NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used.
contrasts.arg.
bigglm. The value of this argument also determines how the model is estimated. When it is 0 (the default), regular Fisher's scoring algorithms are used, which may run into memory issues when handling large data sets. In contrast, a value greater than 0 indicates that the bigglm is employed to fit the GLMs. The function bigglm relies on the bounded memory regression technique, and thus is well suited to large data GLMs.
bigglm. Not used when chunksize = 0. The maxit argument defaults to 50 in cpglm if not specified.
cpglm returns an object of class "cpglm". See cpglm-class for details of the return values as well as various methods available for this class.
tweedie package for approximating the density of the compound Poisson distribution. Indeed, the function tweedie.profile in that package makes available the profile likelihood approach. The cpglm function differs from tweedie.profile in two aspects. First, the user does not need to specify the grid of possible values the index parameter can take. Rather, the optimization of the profile likelihood is automated. Second, big data sets can be handled where the bigglm function from the biglm package is used in fitting GLMs. The bigglm is invoked when the argument chunksize is greater than 0. It is also to be noted that only MLE estimate for the dispersion parameter is included here, while tweedie.profile provides several other possibilities.The package used to implement a second approach using the Monte Carlo EM algorithm, but it is now removed because it does not offer obvious advantages over the profile likelihood approach for this model.
The control argument is a list that can supply various controlling elements used in the optimization process, and it has the following components:
bound.pc(1.01, 1.99). traceoptimizer = "nlminb" or optimizer = "L-BFGS-B", this is the same as the trace control parameter, and for optimizer = "bobyqa", this is the same as the iprint control parameter. See the corresponding documentation for details.
max.iter300. max.fun2000.
cpglm-class, glm, tweedie, and tweedie.profile for related information.
fit1 <- cpglm(RLD ~ factor(Zone) * factor(Stock),
data = FineRoot)
# residual and qq plot
parold <- par(mfrow = c(2, 2), mar = c(5, 5, 2, 1))
# 1. regular plot
r1 <- resid(fit1) / sqrt(fit1$phi)
plot(r1 ~ fitted(fit1), cex = 0.5)
qqnorm(r1, cex = 0.5)
# 2. quantile residual plot to avoid overlapping
u <- tweedie::ptweedie(fit1$y, fit1$p, fitted(fit1), fit1$phi)
u[fit1$y == 0] <- runif(sum(fit1$y == 0), 0, u[fit1$y == 0])
r2 <- qnorm(u)
plot(r2 ~ fitted(fit1), cex = 0.5)
qqnorm(r2, cex = 0.5)
par(parold)
# use bigglm
fit2 <- cpglm(RLD ~ factor(Zone),
data = FineRoot, chunksize = 250)
Run the code above in your browser using DataLab